The Meditative Coder: September 2018

This post is a little late in coming as I made the change earlier this year.

I used to use a hosting service, suso.com, for my main website and email. It ran a message board I built using Perl, and it also ran a older version of wikimedia for a personal wiki. But the service cost a fair amount, and neither the message board nor the wiki was being used much. So to save money, I cancelled it and moved the content to github.

The content is still available at geeky-boy.com, but I dumped the wiki pages to a flat directory. If I want to edit them, I need to edit the raw HTML.

I somewhat mourn the loss of my wiki. I like wikis for certain kinds of content. It is especially powerful for collaborative efforts, particularly for geographically-separated teams. But even for single-user personal-use, it presents such a low barrier to use. If I want to update a page, it's just a few button clicks away. And the update process is easy (assuming minimal use of fancy wiki markup). And it's easy to see change history, roll back changes, etc.

Contrast this with web pages on github where you have to edit them locally in HTML, check in the changes, and sync with "gh-pages" branch to make them live. It takes longer, and requires specialized software. E.g. I can't easily do it from a phone or tablet.

There are "free" wikis out there, but I don't like the ads, and most of them use non-wikimedia software; the few I've tried I haven't liked.

Maybe someday I will try some kind of third-party content management system. Or maybe I'll write my own wiki software as a fun personal project (maybe do the rendering of the markup in the browser in Javascript). Or maybe I just don't really need a wiki. Long ago, I imagined that the blog and the wiki would compliment each other, with content in each referring to content in the other. Blog for "news", wiki for "content". But it hasn't worked out that way.

So rest-in-peace wiki.geeky-boy.com. You were fun while you lasted.

The scanf() family of functions (man page) are used to parse ascii input into fields and convert those fields into desired data types. I have very rarely used them, due to the nature of the kind of software I write (system-level, not application), and I have always viewed the function with suspicion.

Actually, scanf() is not safe. Neither is fscanf(). But this is only because when an input conversion fails, it is not well-defined where the file pointer is left. Fortunately, this is pretty easy to deal with - just use fgets() to read a line and then use sscanf().

You can use sscanf() safely, but you need to follow some rules. For one thing, you should include field widths for pretty much all of the fields, not just strings. More on the rules later.

Input Validity Checking with sscanf()

I am a great believer in doing a good job of checking validity of inputs (see strtol preferred over atoi). It's not enough to process your input "safely" (i.e. preventing buffer overflows, etc.), you should also detect data errors and report them. For example, if I want to input the value "21" and I accidentally type "2`", I want the program to complain and re-prompt, not simply take the value "2" and ignore the "`".

One problem with doing a good job of input validation is that it often leads to lots of checking code. In strtol preferred over atoi, my preferred code consumes 4 lines, and even that is cheating (combining two statements on one line, putting close brace at end of line). And that's for a single integer! What if you have a line with multiple pieces of information on it?

Let's say I have input lines with 3 pipe-separated fields:

id_num|name|age

where "id_num" and "age" are 32-bit unsigned integers and "name" is alphabetic with a few non-alphas. We'll further assume that a name must not contain a pipe character.

Bottom line: here is my preferred code:

#define NAME_MAX_LEN 60
. . .
unsigned int id;
char name[NAME_MAX_LEN+1]; /* allow for null */
unsigned int age;
int null_ofs = 0;

(void)sscanf(iline,
"%9u|" /* id */
"%" STR(NAME_MAX_LEN) "[-A-Za-z,. ]|" /* name */
"%3u\n" /* age */
"%n", /* null_ofs */
&id, name, &age, &null_ofs);

if (null_ofs == 0 || iline[null_ofs] != '\0') {
printf("PARSE ERROR!\n");
}

Error Checking

My long-time readers (all one of you!) will take issue with me throwing away the sscanf() return value. However, it turns out that sscanf()'s return value really isn't as useful as it should be. It tells you how many successful conversions were done, but doesn't tell you if there was extra garbage at the end of the line. A better way to detect success is to add the "%n" at the end to get the string offset where conversion stopped, and make sure it is at the end of the string.

If any of the conversions fail, the "%n" will not be assigned, leaving it at 0, which is interpreted as an error. Or, if all conversions succeed, but there is garbage at the end, "iline[null_ofs]" will not be pointing at the string's null, which is also interpreted as an error.

STR Macro

More interesting is the "STR(NAME_MAX_LEN)" construct. This is a specialized macro I learned about on StackOverflow:

#define STR2(_s) #_s
#define STR(_s) STR2(_s)

Yes, both macros are needed. So the construct:
"%" STR(NAME_MAX_LEN) "[-A-Za-z,. ]"
gets compiled to the string:
"%60[-A-Za-z,. ]"

So why use that macro? If I had hard-coded the format string as "%60[-A-Za-z,. ]", it would risk getting out of sync with the actual size of "name" if we find somebody with a 65 character name. (I would like it even more if I could use "sizeof(name)", but macro expansion and "sizeof()" are handled by different compiler phases.)

Newline Specifier

This one a little subtle. The third conversion specifier is "%3u\n". This tells sscanf() to match newline at the end of the line. But guess what! If I don't include a newline on the string, the sscanf() call still succeeds, including setting null_ofs to the input string's null. In my opinion, this is a slight imperfection in sscanf(): I said I needed a newline, but sscanf() accepts my input without the newline. I suspect sscanf() is counting newline as whitespace, which it knows to skip over.

If I *really* want to require the newline, I can do:

if (iline[null_ofs - 1] != '\n') then ERROR...

Integer Field Widths

Finally, I add field widths to the unsigned integer conversions. Why? If you leave it out, then it will do a fine job of converting numbers from 0..4294967295. But try the next number, 4294967296, and what will you get? Arithmetic overflow, with a result of 0 (although the C standards do not guarantee that value). And sscanf() will not return any error. So I restrict the number of digits to prevent overflow. The disadvantage is that with "%9u" you cannot input the full range of an unsigned integer. An alternative is to use "%10Lu" and supply a long long variable. Then range check it in your code. Or you could just use "%8x" and have the user input hexidecimal. Heck, you could even input it as a 10-character string and use strtol()!

I guess "%9u" was good enough for my needs. I just wish you could just tell sscanf() to stop converting on arithmetic overflow!

Not Perfect

I already mentioned arithmetic overflow. Any other problems with the above code? Well, yeah, I guess. Even though I specify "id" and "age" to be unsigned integers, I notice that sscanf() will allow negative numbers to be inputted. Sometimes we engineers consider this a feature, not a bug; we like to use 0xFFFFFFFF as a flag value, and it's easier to type "-1". But I admit it still bothers me. If I input an age of -1, we end up with the 4-billiion-year-old man (apologies to Mel Brooks).

The Meditative Coder

Wednesday, September 12, 2018

Goodby my Wiki

Tuesday, September 11, 2018

Safe sscanf() Usage

Tuesday, September 4, 2018

Safe C?

About Me

Tags (see here)

Blog Archive