Wednesday, April 16, 2014

strtol preferred over atoi

We've all done it: parsed command-line parameters and converting numeric strings to integers using "atoi()".  And (hopefully) we've all felt guilty about it, because "atoi()" sucks.

Let's say I have a program, "blunjo", which takes a "-d" option with numeric debug level (0=no debug, 1=a little debug, 2=a lot of debug).  So I might use it like this:

    blunjo -d 1 input_file1 input_file2 ...

Inside the code I probably call "getopt()" and include the code fragment:

    case 'd':
        debug_opt = atoi(optarg);
        break;

So, what happens if the user forgets exactly how to use it and enters this:

    blunjo -d input_file1 input_file2 ...

In this case, "optarg" points at "input_file1", which "atoi()" happily converts to zero, turns off debug, and silently skips processing "input_file1".  Might be nice if it actually told the user that "input_file1" is an invalid integer and printed the usage string.

Enter "strtol()".  It's a little more complicated to use (also more flexible):

    long int strtol(const char *nptr, char **endptr, int base);

The Linux man page contains two interesting bits:

If endptr is not NULL, strtol() stores the address of the first invalid character in *endptr. If there were no digits at all, strtol() stores the original value of nptr in *endptr (and returns 0). In particular, if *nptr is not '\0' but **endptr is '\0' on return, the entire string is valid.

... the calling  program  should  set  errno to 0 before the call, and then determine if an error occurred by checking whether errno has a non-zero value after the call.

So this is better code:

    case 'd':
        char *p = NULL;  errno = 0;
        debug_opt = strtol(optarg, &p, 10);
        if (errno != 0 || p == optarg || p == NULL || *p != '\0') {
            usage("Invalid numeric value for -d option"); }
Note that you must make sure that "optarg" is non-null before calling strtol.  If you use getopt and specify "d:" then getopt will guarantee a non-null "optarg".  But if you are parsing the command-line string yourself, beware of the user entering "-d" with nothing at all following it - the next "argv[]" will be null.  Also note that the "p==NULL" check is technically not necessary; so long as "optarg" is non-null, "p" will never be left at null.  However, given that I'm not responsible for the code that sets "p", it just seems like good practice to include the sanity check before dereferencing it.

Here's a macro to make it all even easier, handle 0x-prefixed hexidecimal, and even prints a programmer-friendly error specifying the file:line of the call to it:

    #define SAFE_ATOL(a,l) do { \
      char *in_a = a; char *temp = NULL; long result; errno = 0; \
      if (*in_a == '0' && *(in_a+1) == 'x') \
        result = strtol(in_a+2, &temp, 16); \
      else \
        result = strtol(in_a, &temp, 10); \
      if (errno!=0 || temp==in_a || temp==NULL || *temp!='\0') { \
        fprintf(stderr, "%s:%d, Error, invalid numeric value for %s: '%s'\n", \
           __FILE__, __LINE__, #l, in_a); \
        exit(1); \
      } \
      l = result; /* "return" value of macro */ \
    } while (0)
Here's a usage of the macro:

    case 'd':

        SAFE_ATOL(optarg, debug_opt);
Note that on errors, it abruptly exits the program.

Finally, there is also "strtoll()" which returns a long long int, and has the same error-checking.  The functions "strtoul()" and "strtoull()" are similar but for unsigned.

EDIT: I'm pleased to discover that the function "inet_pton()" does a good job of error checking a dotted-decimal IP address.  For example, adding garbage to the end of a valid IP address is flagged as an error.

EDIT2: I've enhanced the above macro in a few ways and put it on my github. See: https://github.com/fordsfords/safe_atoi

No comments: