Wednesday, September 30, 2015

Coding Pet Peeves

Ok, these things are not the end of the world.  During a code inspection, I might not even bring them up.  Or maybe I would - it would depend on my mood.


    if (use_tls != 0) {
        /* Code having to do with TLS */

So, what is "use_tls"?  It's obviously somewhat of a flag - if it's non-zero then we need to do TLS stuff - but is it a count?  If it's a count, then it should have been named "use_tls_cnt", or maybe just "tls_cnt".  If it's being used as a boolean, then it should have tested as a boolean:

    if (use_tls) {
        /* Code having to do with TLS */

And yes, this is actual code that I've worked on.  It was being used as a boolean, and should have been tested as one.  Is this a big deal?  No, like I said, I might not have brought it up in a review.  But I am a believer that the code should communicate information to the reader.  The boolean usage tells the reader more than the numeric version (and is easier on the eyes as well).


    state_info = OUR_MALLOC_1(sizeof state_info_t);
    /* Code using state_info */

Again, actual code I've worked on.  We have a code macro for malloc which does some useful things.  But the author couldn't think of anything useful to do with free so he didn't make a free macro.  He should have.  There have been at least two times (that I know of) when it would have been tremendously useful.  One time we just wanted to log a message for each malloc and free because we suspected we were double-freeing.  Another time we thought that a freed structure was being accessed, and we wanted free to write a pattern on the memory before freeing.

"No problem," you say, "just create that macro and name it free."  Nope.  There are other mallocs: OUR_MALLOC_2, OUR_MALLOC_3, etc (no they aren't actually named that).  And some of the code doesn't use any of the macros, it just calls malloc directly!  For that second feature (writing a pattern before freeing), you need special code in the malloc part as well as the free part.  These all should have been done consistently so that OUR_MALLOC_1 only works with OUR_FREE_1, etc.  That would have allowed us to do powerful things, but as it is, we couldn't.


I bet there is an official name for this:

    if ('q' == cmd) break;

I've always felt that code should be written the way you would explain it to somebody.  Would you say, "If q is the command, then exit the loop."  No.  You would say, "If the command is q, then exit the loop."  I've seen this construct a lot where they put the constant in front of the comparison operator, and it is almost always awkward.

Oh, I know there's a good reason for it: it's a form of defensive programming.  Consider:

    if (cmd = 'q') break;

Oops, forgot an equals.  But the compiler will happily assign 'q' to cmd, test it against zero, and always do the break.  Swap the variable and the constant and you get a compile error.

But come on.  Gcc supports the "-parentheses" option which warns the above.  In fact, gcc supports a *lot* of nice optional warnings, which people should be using all the time.  "-Wall" is your friend!


Ok, I'm more of a recent convert here.  I've written hundreds of functions like this:

    void my_funct(blah blah blah)
        int something = 0;
        int something_else = 1;
        .... and another 10 or 15 variables

In the REALLY old days, you *had* to declare all local variables immediately after the function's open brace.  But modern C compilers let you declare variables anywhere you want.  Variables should be declared and initialized close to their first use.  This has two advantages: as the reader is reading the code and he encounters a variable, he can see right away what its type and initial value is without having to scroll up and back down again.  But more importantly, when the user sees the variable declared, he knows that the variable is not used anywhere above that declaration.  Whereas if it is declared at the top of the function, he has to examine all the function code above the first use to make sure that it isn't secretly being changed from its initial value.


    i++;  /* increment i */

Oh, thank goodness somebody taught me what the "++" operator does!  Note to programmer: you don't need to teach me the language.

    fclose(file[i]);  /* close the ith file */

Are you sure?  I thought the "fclose" function *opened* a file.  Note to programmer: you don't have to teach me the standard library functions.  They all have man pages.

    session_cleanup(client[i]);  /* clean up the client session */

OK, session_cleanup() probably doesn't have a man page.  But the comment doesn't contain any more information than the function and variable names!

One might add comments to say what the variable "i" is for and what the files in the "filep[]" array are for.  That at least gives information that a reader might not know.  But even those should not be comments, they should be descriptive variable names.

Most comments I've encountered should simply be removed as unnecessary clutter that makes it hard to read the code.  The best comments don't try to explain *what* the code is doing, they explain *why* it is being done.  Usually that is pretty obvious, so only give the "why" explanations when the answer may not be obvious.


    /* Usually we would close the log file *after* we clean up the
     * session, in case the cleanup code wants to log an error.  But
     * the session_cleanup() function is legacy and is written to
     * open and close the log file itself on error.  */

That's better.


"Really Steve?  Brace indention?  What are you, 12?"

(grumble)  I know, I should have grown past this way back when I realized that emacs v.s. vi just doesn't matter.  And really, I don't much care between the various popular styles.  If writing from scratch, I put the open brace at the end of the "if", but I'm also happy to conform to almost any style.

But I just recently had the pleasure of peeking inside the OpenSSL source code and I saw something I hadn't seen for many years:

    if (blah)

It's called "Whitesmith" and supposedly has advantages.  But those advantages seem more philosophical than practical ("the alignment of the braces with the block that emphasizes the fact that the entire block is conceptually (as well as programmatically) a single compound statement").  Um ... sorry, I prefer actual utility over making a subtle point.  Having the close unindented makes it easier to find the end of the block.   The jargon file claims that, "Surveys have shown the Allman and Whitesmiths styles to be the most common, with about equal mind shares."  Really?  I've been a C programmer for over 20 years with 7 different companies, and none of those places used Whitesmith.  This is only the second time I've even seen it.  Equal mind share?   Here's a survey which has Allman 17 times as popular as Whitesmith.  I think jargon file got it wrong.

And yeah, I know that if I were immersed in it for a few months, I wouldn't even notice it any more.  Just like emacs v.s. vi, it doesn't really matter, so long as a project is internally consistent.  But hey, this is *my* rant, and I don't like it!  So there.

Thursday, September 10, 2015

RS-232 serial ports

Remember RS-232?  It is rare now-a-days to find a need for RS-232, but it can come up.  In my case, a couple of machines in our lab have their system consoles tied to a dedicated serial lines which expect a "dumb-ish" terminal (I actually owned a real-life VT-100 for a while).  Again, it is rare that one needs the system console, but especially when you have hardware issues, sometimes it is the only way to get in.

RS-232 is an interface specification which was basically invented to do one thing: connect a MODEM to either a computer or a terminal.  If you look at the RS-232 spec, you will see very MODEM-ish signals, like carrier detect, ring indicator, etc.

The RS-232 interface has two different sides, one named "DTE" and the other named "DCE".  The DTE side is for a terminal or computer.  The DCE side is for a MODEM.  RS-232 is a standard which, among other things, defines what pins on a connector are for which purpose.  If you look at an RS-232 pinout, you will see names on the pins.  For example, on a DB-25 or a DB-9 connector, pin 2 is labeled "Tx".  It is named from the point of view of the DTE, which is to say that the DTE side of the interface is supposed to output to pin 2 serial data it is transmitting to the DCE.  The DCE side is supposed to input from pin 2 serial data it is receiving from the DTE.  So the names make sense for the DTE (it transmits on Tx and I receive on Rx) and the names are backwards for the DCE (it receives on Tx and transmits on Rx).  DTE devices normally have male connectors (with pins) and DCE devices normally have female connectors, although back in the old days, I did find the odd device which violated this standard.

These days, we rarely use MODEMs.  Most of the time we deal with RS-232, we are trying to connect two DTE devices together, like connecting a terminal to a computer.  You can't just use a normal cable for that since both the terminal and the computer are DTEs and will try to transmit on Tx (pin 2).  So to get two DTEs to talk, you need a "null modem".  This is usually a specially-wired cable with a female connector on both ends.  For example, one side's Tx pin is wired to the other side's Rx pin, and vice versa.  See for details.

Modern laptops often don't have serial lines any more, so you'll probably need a USB serial adapter.  I have a Belkin which connects to a Windows PC and presents as COM3 (required driver).  I use putty, which lets you select "serial" and specify which COM port.  I also have a TRENDnet which connects to my Mac and presents as /dev/tty.usbserial (also required driver).  I use the normal Mac "terminal" application and enter:

    screen /dev/tty.usbserial 9600

Both of my USB devices are male DTE.  To use either one as a serial console, I need to use a female-to-female null modem.