Monday, April 28, 2014

Cute "sed" Trick: rotate file

I wanted a script that would invoke a process, passing in a port number from a circular pool of ports.  When the process exits and restarts, I want the script to pass in the *next* port from the pool.

    PORT=`head -1 port_file.txt`
    sed -i -e '1h;1d;$G' port_file.txt

In the above "sed" command:

  • The "-i" causes the file "port_file.txt" to be edited in-place.
  • The "1h" sed command yanks the first line into the "hold" space.
  • The "1d" sed command deletes the first line (prevents it from being output).
  • The "$G" command appends the hold space after the last line of the file.

Thus, given that "port_file.txt" contains:
    12000
    12001
    12002
    12003
the above two commands will leave the file with:
    12001
    12002
    12003
    12000
and "PORT" set to 12000.

Monday, April 21, 2014

Old C Coding Habits Die Hard

Old habits die hard.

In their 1978 book "The C Programming Language", Brian Kernighan and Dennis Ritchie described a version of C which has since become known as "K&R" C.  For those of you who aren't older than dirt, K&R C differed in many ways from modern C.  For example:

    foo(c, v)
    int c;
    char **v;
    {

Whoa!  What's that?  Original K&R C didn't let you declare the formal parameter types in-line with the function definition, you had to declare them between the ")" and the "{".  Also, functions defaulted to being of type "int".

Lots of fossilized programmers like me were forced into habits in the old days which have outlived their need.  Now, a lot of younger programmers getting out of school have as their first real-world experience the job of maintaining that old code, learning those same obsolete habits through osmosis.


LOCAL VARIABLES

In K&R C, local variables *had* to be declared at the top of the function.  With C89, locals could be declared at the start of any compound statement (i.e. after any "{").  Finally, as of C99 locals could be declared basically anywhere.

In my opinion, it makes sense for a variable to be declared and initialized immediately before (or very close to) its first usage.  Doing this conveys valuable information for a future code maintainer: this variable isn't used prior to this line.  I can't tell you how often I spent time doing reverse searches for a variable to see where else it is being used.

Here's some "old habits" code:

    foo_find(...)
    {
        int found = 0;
        ...tens of lines...
        while (! found) {

With this, you might go straight to the top of "foo()" to see how "found" is declared and initialized, but then you still have to search the tens of intervening lines to see how "found" might changed.

I prefer this:

    foo_find(...)
    {
        ...tens of lines...
        int found = 0;
        while (! found) {

Sure, there might be a 20-year-old compiler which can't handle it, but anybody using a 20-year-old compiler has bigger problems than this.


GLOBAL VARIABLES

Another habit which I think might be a K&Rism: declaration of global variables at the top of the file. I might be wrong, but I suspect that K&R C disallows global variables to be declared after some functions are defined.  As with local variables, this restriction is no longer in force.  So, if you have a group of functions which implement an abstraction that uses globals, but the abstraction is not significant enough to justify putting them in their own file, I think it makes sense to put the globals associated with the abstraction in front of the first implementation function.  For example:

    #define FOO_MAX_NUM 67
    typedef foo_t struct {...};
    foo_t foo_storage[FOO_MAX_NUM];
    static int foo_num_stored = 0;

    foo_t *foo_create(...)
    {

This code, including the #define, the typedef, and the global variables, may well be positioned after other functions are already defined.  Once again, it helps a code reader know that the definitions are not used in the preceding code.


INCLUDE FILES

In the above example, "FOO_MAX_NUM" and "foo_t" are defined close to the code, instead of at the top of the file.  But many programmers wouldn't even put them at the top of the file, they would put them in an include file "foo.h".

I don't think this is a K&Rism.  It's just a habit to put all typedefs and #defines in the include file.  But again, I advocate for keeping things as localized as possible.  Definitions and declarations which are internal implementation details to an abstraction should be hidden from general view.  C++ may do a better job of managing the external and internal details of an abstraction, but the guidelines should be followed in C as well.

Wednesday, April 16, 2014

strtol preferred over atoi

We've all done it: parsed command-line parameters and converting numeric strings to integers using "atoi()".  And (hopefully) we've all felt guilty about it, because "atoi()" sucks.

Let's say I have a program, "blunjo", which takes a "-d" option with numeric debug level (0=no debug, 1=a little debug, 2=a lot of debug).  So I might use it like this:

    blunjo -d 1 input_file1 input_file2 ...

Inside the code I probably call "getopt()" and include the code fragment:

    case 'd':
        debug_opt = atoi(optarg);
        break;

So, what happens if the user forgets exactly how to use it and enters this:

    blunjo -d input_file1 input_file2 ...

In this case, "optarg" points at "input_file1", which "atoi()" happily converts to zero, turns off debug, and silently skips processing "input_file1".  Might be nice if it actually told the user that "input_file1" is an invalid integer and printed the usage string.

Enter "strtol()".  It's a little more complicated to use (also more flexible):

    long int strtol(const char *nptr, char **endptr, int base);

The Linux man page contains two interesting bits:

If endptr is not NULL, strtol() stores the address of the first invalid character in *endptr. If there were no digits at all, strtol() stores the original value of nptr in *endptr (and returns 0). In particular, if *nptr is not '\0' but **endptr is '\0' on return, the entire string is valid.

... the calling  program  should  set  errno to 0 before the call, and then determine if an error occurred by checking whether errno has a non-zero value after the call.

So this is better code:

    case 'd':
        char *p = NULL;  errno = 0;
        debug_opt = strtol(optarg, &p, 10);
        if (errno != 0 || p == optarg || p == NULL || *p != '\0') {
            usage("Invalid numeric value for -d option"); }
Note that you must make sure that "optarg" is non-null before calling strtol.  If you use getopt and specify "d:" then getopt will guarantee a non-null "optarg".  But if you are parsing the command-line string yourself, beware of the user entering "-d" with nothing at all following it - the next "argv[]" will be null.  Also note that the "p==NULL" check is technically not necessary; so long as "optarg" is non-null, "p" will never be left at null.  However, given that I'm not responsible for the code that sets "p", it just seems like good practice to include the sanity check before dereferencing it.

Here's a macro to make it all even easier, handle 0x-prefixed hexidecimal, and even prints a programmer-friendly error specifying the file:line of the call to it:

    #define SAFE_ATOL(a,l) do { \
      char *in_a = a; char *temp = NULL; long result; errno = 0; \
      if (*in_a == '0' && *(in_a+1) == 'x') \
        result = strtol(in_a+2, &temp, 16); \
      else \
        result = strtol(in_a, &temp, 10); \
      if (errno!=0 || temp==in_a || temp==NULL || *temp!='\0') { \
        fprintf(stderr, "%s:%d, Error, invalid numeric value for %s: '%s'\n", \
           __FILE__, __LINE__, #l, in_a); \
        exit(1); \
      } \
      l = result; /* "return" value of macro */ \
    } while (0)
Here's a usage of the macro:

    case 'd':

        SAFE_ATOL(optarg, debug_opt);
Note that on errors, it abruptly exits the program.

Finally, there is also "strtoll()" which returns a long long int, and has the same error-checking.  The functions "strtoul()" and "strtoull()" are similar but for unsigned.

EDIT: I'm pleased to discover that the function "inet_pton()" does a good job of error checking a dotted-decimal IP address.  For example, adding garbage to the end of a valid IP address is flagged as an error.

EDIT2: I've enhanced the above macro in a few ways and put it on my github. See: https://github.com/fordsfords/safe_atoi

Monday, April 14, 2014

Password strength

I disagree with a lot of "password strength" measures.  Most measures want you to include upper and lower case, digits, special characters, etc.  I don't feel they are necessary, and don't give you as much "security" as you might think (like substituting zero for the letter "o").

Then along came Randall Munroe with an XKCD cartoon which does a much better job of explaining it than I ever could:

Password Strength


Most of the password strength "meters" that you see on sites are based on the idea that digits, special characters, and mixed-case are the magic elixir for strong passwords.  I was quite dismayed to discover that most of them consider "P@ssw0rd" to be very secure, which is absurd.  Then I found zxcvbn:


Finally, a password strength meter which knows that "P@ssw0rd" is low security (score=0 of 4, crack time 0 seconds)!  Whereas "correcthorsebatterystaple" is very secure (score=4 of 4, crack time 65 years).

Another password method that I've heard hyped which I disagree with is the haystack approach.  According to this author, the password "D0g....................." (21 dots) is very strong.  This is ONLY true for brute-force password cracking, which is NOT how serious crackers work.  They do dictionary and repeated character analysis.  According to zxcvbn, "D0g....................." is weak (score=0, crack time 84 seconds).  YES!

One fly in all this ointment: many systems limit your password length, sometimes to as few as 8 characters.  This makes it very hard to use 4 random words, meaning that you probably need to go the random route.  For the 8 random character password "0ZhyUQ63", zxcvbn rates it 4 of 4, with centuries required to crack it.  Whereas "saytroll" is weak, with 22-second crack time.  (Note that "S@yTr0ll" is still weak, with a 7-minute crack time - so much for magic elixirs.)

BTW, my wiki has a somewhat longer article on password strength.