The Meditative Coder

Tuesday, June 24, 2014

My work email inbox is empty

Wow. I think the last time my work in-box was empty was when I joined 29West, back in 2005. It stayed empty for probably an hour.

I dunno ... it's actually a little scary looking.

Monday, June 16, 2014

I am very happy with the Maglus stylus for my iPad. I wrote an Amazon review of why I like it so much (there are specific reasons).

The reason for this post is that you can get a 10% discount if you use this link. It's part of an "affiliate" program that also kicks me a bit of money, which I have re-directed to the American Red Cross.

That said, I did notice that on Amazon, I could find it for even less than the 10% off list, so if you're looking to save money, shop around. I would rather see you save as much money as you can, and independently donate a decent amount to the Red Cross because you want to support their good work.

Anyway, I never feel 100% comfortable promoting a commercial product lest I come across as a corporate shill. Then I saw a post on their corporate blog on James Joyce and Ulysses. Somehow that makes me feel less guilty being a booster. Which is really silly since I've never read Ulysses!

Hmm ... what tags should I set on this post? I don't see one for "shameless commerce". I guess "tips" since I really do like the stylus a lot.

Thursday, June 12, 2014

Order in Structure Assignment Statements

I had what I thought was a pretty simple piece of code, transferring a structure from one thread to another. Without describing it blow-by-blow, I'll present it thus: a writer thread is measuring temperature and pressure of a steam turbine. A reader periodically samples the measurement. There is no need for the reader to get every measurement, it only needs the most-recent at the point that it samples.

DESIGN!

A global data structure:

struct measurement_s {

    volatile uint64_t post_seq;  /* sequence number of measurement */

    volatile  int64_t temperature;

    volatile  int64_t pressure;

    volatile uint64_t pre_seq;   /* sequence number of measurement */

} live;

The algorithm is for the writer to first increment pre_seq, then update the temperature and pressure, then increment the post_seq. The reader reads the post_seq first, then temperature and pressure, then pre_seq. If the reader sees pre_seq != post_seq, it means that a collision has happened, and the reader should re-try. Here's the reader code:

        struct measurement_s sample;

        do {

            sample = live;

        } while (sample.pre_seq != sample.post_seq);

        /* now have valid sample */

BROKEN!

Pretty simple. Easy to understand. And it didn't work right. The test code was written to make it obvious when inconsistent values are read, and the code above generated inconsistent reads. A LOT of inconsistent reads. How can this be??? I used volatile to prevent the optimizer from messing with the order of reads and writes, and it's an x86-64, which has a very predictable memory model, removing the need for most hardware memory barriers. The algorithm is correct! Why is it failing?

DIAGNOSE!

So I did what I've been doing a surprising amount recently: I generated assem language (this time withOUT optimization turned on). Here's the structure assignment:

    leaq    -64(%rbp), %rax

    movq    _live+24(%rip), %rcx

    movq    %rcx, 24(%rax)

    movq    _live+16(%rip), %rcx

    movq    %rcx, 16(%rax)

    movq    _live+8(%rip), %rcx

    movq    %rcx, 8(%rax)

    movq    _live(%rip), %rcx

    movq    %rcx, (%rax)

HEY! It copied the structure in the REVERSE ORDER than I expected! So it copies the pre_seq first, then the temperature and pressure, and post_seq last! No wonder it detected collisions, that order is NOT thread-safe!

HAPPILY EVER AFTER!

So I replaced the structure assign with explicit copies of the fields:

        struct measurement_s sample;

        do {

            /* Don't do "sample = live;", order of field copies is not defined. */

            sample.post_seq = live.post_seq;

            sample.temperature = live.temperature;

            sample.pressure = live.pressure;

            sample.pre_seq = live.pre_seq;

        } while (sample.pre_seq != sample.post_seq);

YAY! It works. (Whew!)

It would also be possible to define a sub-structure containing temperature, pressure, and any additional interesting data. The fields in the sub-structure could be copied as struct assign, allowing the compiler to do it in any order. But the pre_seq and post_seq must be individually copied in the right order.

DIGRESSIONS!

I do love those exclamation points!

So, the above code works. How *good* is the design? For physical process measurement, probably just fine. It would be rare to measure physical characteristics more often than a few times per second, so the chances of the reader having to re-try the sample are very low. (On the other hand, why not use mutexes if you are going that slow?)

But suppose this algorithm is applied to market data, which might have many thousands of ticks per second? The algorithm shown suffers from a fundamental flaw: starvation. If a writer and reader get pathologically synchronized, there is no bound to the number of times the reader will cycle, trying to get a clean sample.

In practice, it might be fine. But whenever possible, I prefer a solution that works by design, not by circumstance.

I think a better approach might be to use a non-blocking queue, like this one. The writer presumably requires non-zero work to produce a new record. The consumer should read the queue dry in a tight loop before processing the last record read. Now the queue need only be long enough to hold the maximum number of produced records between two reads. A slow reader will simply clean out the queue each time it reads.

Some might question the wisdom of doing ANY kind of programming where the order of field copies in a struct assign is important. In the old days, this program would have been done with a critical section, protected by a mutex or semaphore. That works fine, and doesn't depend on order. But high-performance applications of today need much lower latencies. So lock-free algorithms are important. With lock-free, you need to concern yourself with order; the compiler can change the order of the assem language operations from the source code, and the hardware can change the order of physical memory operations from the assem code. Compile optimizer barriers address the former; hardware memory barriers address the latter.

Finally, some will question my choice of making all the fields of the structure volatile. I admit it is a heavy-handed approach that I used for expediency -- this was a prototype after all. For production code I would use a more-precisely targeted VOL macro (mentioned in an earlier post), often in cooperation with a general compile optimizer barrier, like "asm volatile ("" ::: "memory");". Note that these techniques are NOT used to squeeze every nanosecond out of the code. They are used to insure code correctness.

Tuesday, June 10, 2014

Error handling: the enemy of readability?

I've been a programmer since the mid-1970s, call it 40 years. During that time I've seen the rise into popularity: structured programming and object-oriented programming. (More-recently, I've seen increased interest in functional programming, and I've started learning a bit about it myself, but I'm not conversant-enough in it to write intelligently about it. So I'll omit it from this discussion.)

After all these years, I find it very interesting that I haven't seen a common approach to internal error handling become widely-accepted, the way structure and object-orientation have.

I'm speaking here of internal errors, not user errors. If the user enters bad input, you owe it to the user to detect the problem and report it in a user-friendly way, allowing the user to correct his mistake. User error detection and handling code is fully part of the program.

Internal errors are different. These are found by inserting sanity checks to see if something that "should never happen" might have actually happened. Very often, failed sanity checks represent a software bug that needs to be tracked down and fixed.

ROBUST CODE vs. READABLE CODE?

Most approaches to internal error handling I've seen are poor. Maybe it is partly due to a lack of programmer discipline - we all get lazy sometimes and forget to check a return status - but I don't think that is the primary reason. I believe the main culprit is the need for readable code.

The greatest expense in any significant programming effort is not the original writing of code, it's the subsequent maintenance. The original programmers eventually leave the project, and less-experienced programmers often take over the less-glamorous job of fixing bugs and adding enhancements. It is critically important that the code be readable. External documentation is typically out of date moments after it is written, and even internal documentation (comments) are suspect. The code itself is the only reality that matters, and the harder it is to understand that code, the harder (and more expensive) it will be to maintain and improve it.

And unfortunately, doing a thorough job of checking and handling internal errors often makes the code harder to read.

C

Which is easier to read and maintain? This:

example 1:

  foo_t *foo_create()

{

      foo_t *new_foo = malloc(sizeof(foo_t));

      new_foo->state_filep = fopen("foo.state", "rw");

      new_foo->data_filep = fopen("foo.data", "rw");

      new_foo->reader = reader_create();

      new_foo->writer = writer_create();

      return new_foo;

  }  /* foo_create */

or this:

example 2:

  foo_t *foo_create()

{

      foo_t *new_foo = malloc(sizeof(foo_t));

      if (new_foo != NULL) {  /* malloc successful? */

          new_foo->state_filep = fopen("foo.state", "rw");

          if (new_foo->state_filep != NULL) {  /* fopen successful? */

              new_foo->data_filep = fopen("foo.data", "rw");

              if (new_foo->data_filep != NULL) {  /* fopen successful? */

                  New_foo->reader = reader_create();

                  if (new_foo->reader != NULL) {  /* reader successful? */

                      new_foo->writer = writer_create();

                      if (new_foo->writer != NULL) {  /* writer successful? */

                          return new_foo;

}

                      else {

                          log_error("Could not create writer");

                          reader_delete(new_foo->reader);

                          fclose(new_foo->data_filep);

                          fclose(new_foo->state_filep);

                          free(new_foo);

                          return NULL;

}

}

                  else {

                      log_error("Could not create reader");

                      fclose(new_foo->data_filep);

                      fclose(new_foo->state_filep);

                      free(new_foo);

                      return NULL;

}

}

              else {

                  log_error("Could not open data file");

                  fclose(new_foo->state_filep);

                  free(new_foo);

                  return NULL;

}

}

          else {

              log_error("Could not open state file");

              free(new_foo);

              return NULL;

}

}

      else {

          log_error("Could not malloc foo");

          return NULL;

}

  }  /* foo_create */

Most projects I've worked on use a hybrid of the two approaches, but most lean heavily towards one or the other. The first is easy to read, but brittle; if anything goes wrong, the problem often doesn't generate any visible symptoms till much later, making it a nightmare to diagnose. The second detects errors early and does proper reporting and cleanup, but is a nightmare to read and maintain. (All that duplicated code in the else clauses is sure to get out of sync at some point, resulting in resource leaks or uninitialized accesses, neither of which will be detected unless some error actually happens and the error handling code finally gets exercised.)

Wouldn't it be nice to have both good error detection and readability?

example 3:

  foo_t *foo_create()

{

      foo_t *new_foo = NULL;

      new_foo = malloc(sizeof(foo_t));  ASSRT(new_foo!=NULL);

      memset(new_foo, 0, sizeof(new_foo));

      new_foo->state_filep = fopen("foo.state", "rw");  ASSRT(new_foo->state_filep!=NULL);

      new_foo->data_filep = fopen("foo.data", "rw");  ASSRT(new_foo->data_filep!=NULL);

      new_foo->reader = reader_create();  ASSRT(new_foo->reader!=NULL);

      new_foo->writer = writer_create();  ASSRT(new_foo->writer!=NULL);

      return new_foo;

  ASSRT_FAIL:

      if (new_foo != NULL) {

          if (new_foo->writer != NULL) reader_delete(new_foo0->writer);

          if (new_foo->reader != NULL) reader_delete(new_foo0->reader);

          if (new_foo->data_filep != NULL) fclose(new_foo0->data_filep);

          if (new_foo->state_filep != NULL) fclose(new_foo0->state_filep);

          memset(new_foo, 0, sizeof(new_foo));  /* poison the freed structure */

          asm volatile ("" ::: "memory");       /* See note [1] */

          free(new_foo);

      }

      return NULL;

  }  /* foo_create */

Note [1]: See http://blog.geeky-boy.com/2014/06/clangllvm-optimize-o3-understands-free.html

My philosophy is to catch errors as early as possible by including sanity checks, but doing it unobtrusively. Make it easy to ignore if you're looking for the code's basic algorithm, but easy to understand if you are analyzing the error handling. I think example 3 to be approximately as readable as the first example, and as robust as the second. It is important to keep the error handling code visually small so that it can be appended to the ends of lines. This:

      foo->state_filep = fopen("foo.state", "rw");  ASSRT(foo->state_filep!=NULL);

is much cleaner than this:

      foo->state_filep = fopen("foo.state", "rw");

      if (foo->state_filep) {

          blah blah blah

      }

As you've probably guessed, the ASSRT macro does nothing if the expression is true, or does a "goto ASSRT_FAIL;" if the expression is false. A variation is to use "abort()" instead of goto. In many ways this is FAR superior for a troubleshooter; a core file can provide a full stack trace and access to program state. Calling "abort()" also simplifies the code further, removing explicit clean up. But it also means that the code has given up on any hope of recovery. Differentiating between an un-recoverable error and a recoverable error is often not possible at the point where the error is detected; it is usually considered good form to report the error and return a bad status.

(FYI - the "memset()" before the "free()" is commented as "poison the freed structure". The purpose of this is to make sure that any dangling references to the foo object will seg fault the instant they try to de-reference one of the internal pointers. Again, my philosophy is to find bugs as soon as possible.)

ASSRT()

What is this magical "ASSRT()" macro?

  #define ASSRT(cond_expr) do { \

      if (!(cond_expr)) { \

          char errmsg[256]; \

          snprintf(errmsg, sizeof(errmsg), "%s:%d, failed assert: (%s)", \

              __FILE__, __LINE__, #cond_expr); \

          log_error(errmsg); \

          goto ASSRT_FAIL; \

      }  \

  } while (0)

For those unfamiliar with C macro magic, there are two bits here that might benefit from explanation. First the "do ... while (0)". This is explained pretty well here. It's the standard way to avoid tricky bugs when macros are used as part of if statements. The second is the "#cond_expr" at the end of the sprintf. The C preprocessor converts "#macro input parameter" into a C string.

So, the macro expansion of "new_foo = malloc(sizeof(foo_t)); ASSRT(new_foo!=NULL);" is:

      new_foo = malloc(sizeof(foo_t));  do {

          if (!(new_foo!=NULL)) {

              char errmsg[256];

              snprintf(errmsg, sizeof(errmsg), "%s:%d, condition false: '%s'",

                  "foo.c", 50, "new_foo!=NULL");

              log_error(errmsg);

              goto ASSRT_FAIL;

}

      } while (0);

An important reason for making this a macro instead of a function is so that the __FILE__ and __LINE__ compiler built-ins apply to the source line which uses ASSRT. I've also sometimes enhanced the ASSRT macro, making it specific to a program or even a function by including prints of interesting state information. I've also sometimes included errno. These improvements help the troubleshooter diagnose the internal problem.

One big drawback to this approach? The error messages logged can only be loved by a programmer. What would an end user do with this?

  Error logger: foo.c:50, failed assert: (new_foo!=NULL)

This is wonderful for a programmer; I don't know how many times I've seen an error log containing, "could not open file", only to find that the string appears many times in the source code. This pin-points the source file and line. However, this message is virtually useless to an end user.

On the other hand, is it really any worse than this?

  Error logger: could not malloc foo

This looks less like geek-speak, but doesn't actually help an end user any more. In both cases, the user has little choice but to complain to the developer.

My point is that including an error-specific, highly-descriptive message makes the error handling code longer and more-intrusive. One goal of mine is to keep the ASSRT on the same line as the line it's checking; hard to do if including a descriptive message. But as soon as it overflows onto subsequent lines, it interferes with the logical flow of the code, making it harder to read. You figure, since internal errors are usually the result of a software bug, it's not so bad to log errors that only a programmer could love - only programmers need to deal with them, users only need to accurately report them. (Which reminds me of a war story.)

I have an evolved version of this approach, which I plan to clean up and present in the not-too-distant future.

JAVA

Java adds exception handling, which some would say contradicts my claim that error handling isn't standardized. However, Java exceptions merely standardize the low-level language tool. It doesn't really address the approach that a program takes. When used poorly (e.g. try-catch at every call), it reduces to the unreadability of example 2. When done well, it approximates the good readability of example 3. The key is the sacrifice some user-friendliness of error message in favor of keeping the program structure clear. That sacrifice is worth it since internal errors should rarely happen.

C++

I am a bit embarrassed to admit that I'm not very familiar with C++'s exception handling facility. I've never worked on a project that used it. I've heard many programmers state their strongly-held opinion that C++ exceptions are to be avoided at all costs, but I don't have enough personal experience to have my own opinion. One concrete data point: a computer engineering senior at U of I (a respected institution) tells me that their C++ course does not teach exceptions.

But I figure that even if C++ exceptions are not up to the task, my C method should work fine. I'm not married to any specific language tool, so long as error handling can be added without obscuring the code.

P.S.

I risk being haunted by the ghost of Dijkstra by using a dreaded goto. Maybe Knuth will protect me (alternate). :-)

P.P.S.

I have a slightly different version in my wiki. I change the goto to abort(), and I also print the human-readable form of errno. While fine for most Unix tools, those changes will not be appropriate for all kinds of programs, so the form above is more general.

Saturday, June 7, 2014

Grace Hopper documentary (crowd funding)

I suffer from a little bit of hero worship for Grace Hopper (1906 - 1992). She is often credited with writing the first compiler in 1952, and was an important contributor to the development of COBOL.

There is a crowd-funded effort to produce a documentary on her:

http://www.indiegogo.com/projects/born-with-curiosity-the-grace-hopper-documentary

I would like to see this documentary succeed because I really would like to know more about her than appears in Wikipedia.

Update: Grace Hopper on Letterman:
https://www.youtube.com/watch?v=1-vcErOPofQ
She really was charming.

Thursday, June 5, 2014

clang/llvm optimize -O3 understands free()

Wow, I just found an amazing bug, although I'm not "bug" is the right word for it.

I have an object delete function:

void e_del(e_t *e)
{
if (e == NULL || e->signature != E_SIGNATURE) abort();
e->signature = 0;
free(e);
}

I have int signature at the end of the e_t structure, set at object creation to a known value (E_SIGNATURE). Then, I can check periodically to make sure that a valid object is being used. This detects a variety of bugs, like writing past the end of a buffer in the e_t structure, or just an invalid pointer. The setting of signature to zero in e_del() is to detect a double call to e_del() with the same object.

In my unit tests, I coded a double-delete to make sure it worked, but abort didn't get called! Instead, it double-freed!

Silly me, I must have a stupid error in my logic. Re-build without optimization to make it easier to set breakpoints and single step. Works. Works fine. Without optimization, my code works. With optimize level 3, it doesn't. I generated assembly language for optimize and non-optimize cases, and guess what. At optimize 3, the zero of signature is gone. Completely gone.

I had a theory, which I tested by changing free(e) to printf("%p", e) and guess what. At optimize 3, the zero of signature reappeared. The code worked as expected. Change printf back to free, and the zero of signature disappeared again.

Apparently, clang/llvm knows what free() is for. It knows that only a buggy program would try to access a heap segment after it is freed. So for any non-buggy (and non-multi-threaded) program, it is safe to throw away the "e->signature = 0" line. My problem is that I DON'T want to assume a bug-free program. I want to do a good job of detecting bugs.

How to fix? A variation on Linux's ACCESS_ONCE macro which I named VOL:

#define VOL(_x, _type) (*(volatile _type *)&(_x))
. . .
VOL(e->signature, int) = 0;
free(e);

This "volatilizes" access to e->signature, forcing the optimizer to leave the operation alone. (FYI - my reason for making my own macro is that ACCESS_ONCE uses the gcc-specific built-in "typeof()" to remove the need to pass in the data type. But I didn't want my code to be gcc-specific.)

I've done a lot of multi-threaded programming in my day, most-recently experimenting with lock-free, non-blocking designs, so I've had to worry about the optimizer from time to time. But this is the first time I've ever had a simple, single-threaded, straight-line code work wrong due to the optimizer. Think about what this means. The optimizer knows what free() is for, knows that the freed segment shouldn't be accessed after the free, and makes code generation decisions on that basis. I wonder what else the optimizer thinks it knows...

UPDATE: in its first posting, mistakenly attributed this issue to gcc. I forgot that my Mac now uses the clang/llvm compiler. I tried this with true gcc, and it did NOT omit the clear of signature.

UPDATE2: I changed the assignment of zero to signature to a memset of the whole structure:

void e_del(e_t *e)
{
if (e == NULL || e->signature != E_SIGNATURE) abort();
memset(e, 0, sizeof(*e));
free(e);
}

The compiler did NOT optimize out the memset. I'm guessing that it thought the memset function might have side effects, which is interesting since memset is a compiler builtin. In fact, if signature is the only field in e_t, the memset compiles to the exact same instructions as the zero assignment.

UPDATE3: Wow, optimizers are surprising. I found a case where the memset() IS optimized away. I tried telling the compiler that e points at volatile memory, and it STILL optimized it away. I'm back to using VOL() on the individual fields. Again, this is all clang/llvm on Mac. I've never seen gcc optimize away code based on knowing what free() does.

Tuesday, June 3, 2014

Easy Familiarity

I was at a software conference a couple weeks ago where I attended some sessions on functional programming. I, like many imperative programmers, am finding functional a bit foreign and hard to internalize, and I enjoyed listening to some experienced practitioners talk about it.

In one session, the presenter said, "simple is not the same thing as easy." "Simple" is a characteristic of a thing. It is simple or it is complex. A "hello world" program is simple; spaghetti code is complex. "Easy" relates to an individual. I find pointer-intensive software to be easy. Garry Casparov finds chess easy. The presenter's point was that we should strive to make simple software (complexity is the single greatest enemy to quality). In his opinion, functional languages can make many programs more simple than in imperative languages. This, in spite of the fact that many imperative programmers would find the functional program difficult to understand, due to lack of familiarity.

In another session, the presenter said, "readable != familiar". Same idea, slightly different spin. When I first learned C, I found conditional expressions to have low readability:
int max_xy = (x > y) ? x : y;
Almost thirty years later, I don't give it a second thought. Did it become magically more readable? No, the construct was always readable. I became more familiar. On the other hand, I've had maybe 20 years experience with regular expressions, and they are still virtually a "write-only" language. Regular expressions fundamentally have low readability.

simple != easy
readable != familiar

Both encourage us to not fear change, to not assume that if X was good enough for my grandfather, then it's good enough for me. I find it strange that programmers would resist new ways of thinking, new approaches, new language features. Our profession barely existed 50 years ago, it changes every year. And yet, in my early years, I knew programmers who resisted C because you couldn't do things that you could do in assembly language. A few years later, people were doubting that C++ could ever be usable in embedded systems. (Now Java is being embedded, for goodness sake!) Personal computer operating systems have progressed from simple program loaders (e.g. MSDOS) to full Unix with virtual memory and symmetric multiprocessor capability.

And still programmers resist the new, the different. (I'm no better! I'm still don't consider myself a good OO programmer. I've done very little C++.)

One guy at the conference went so far as to say (approximately), "If you don't learn functional programming, in 5 years there will be a name for your job: 'maintenance.'" I'm not sure that's true -- there are still a lot of new non-OO C code being written -- but it gives me pause ... and motivation to learn some more Haskell.

Monday, May 26, 2014

Memorial Day

I'll admit it: I'm inconsistent and possibly even hypocritical about the military. As a young pup, I toyed with pacifism, but couldn't figure out how to be consistent. My later teenage years held some fear of being drafted into the Vietnam war. I've always been uncomfortable with nationalism and jingoism, both of which give patriotism a bad name.

And yet...

Soldiers throughout history have gone to war with the goal of preventing harm to the ones they love and the societies they live in. I believe that very few soldiers fight out of a sense of pure hatred. These men and women really have offered their lives up in the service of others. And while some soldiers have performed terrible acts, most perform their duties to the best of their abilities.

So each memorial day, I wrestle with these conflicting feelings. But one thing I never feel conflicted about is a deep sense of respect I feel to the men and women who have put their lives in danger, and sorrow at those who have died trying to do the right thing.

I feel this way for soldiers on all sides of every conflict. As a Jew, I have no respect for Nazism. But I do feel respect for German soldiers who were fed lies about Jews and made to fight, and I feel sorrow for German soldiers killed. I'll never forgive the Nazi leaders who knew they were spreading lies; I completely forgive the foot soldiers who thought they were defending their families and society.

So please pause for a moment today and pay your respects to the soldiers of every country throughout history who have given their lives trying to do the right thing.

Monday, May 19, 2014

Fast, lock-free, non-blocking queue

In my copious spare time (insert laugh-track here), I have developed a fairly simple queue that features:

Fixed size (size specified at queue creation time).
Non-blocking (enqueuing to a full queue returns immediately with status; dequeuing from an empty queue returns immediately with status).
Single Producer, Single Consumer (SPSC).
No dynamic memory allocates or frees during enqueue and dequeue operations. Messages are void pointers (null pointers not allowed).
High performance. On my Macbook Air, streaming data through a queue averages 11 ns per message (queue size=32), while ping-pong latency is 69 ns (one-way).
Tested on Mac OSX 10.9 (Mavericks) and Linux 2.6 and 3.5 kernels. At present, I only recommend the 64-bit x86 processor family, due to the fact that I take advantage of its programmer-friendly memory model. In the future I hope to generalize it to be efficient on other processor types.

I'm making it available as public domain (or more-accurately, as CC0), which means you can use it for any purpose without restriction.

You can get the module and its documentation here: https://github.com/fordsfords/q/tree/gh-pages

In addition to the queue itself, I produced semi-literate internals documentation I'm rather proud of my semi-literate documentation system.

Sunday, May 18, 2014

Cache Line Size and Alignment

I've been doing some fiddling around with a high-performance queue, and have noticed something. There is a wide-spread belief that CPU caches have a line size of 64 bytes.

Alas, all the world is not X86.

Modern PowerPC cache line size is 128 bytes. Same with Itanium-2. IBM's system Z CPUs have a line size of 256 bytes.

I saw the results of this personally with our Itanium Linux machine. My queue performed VERY poorly. I increased the padding between critical fields to assume a 128-byte cache line, and magically the performance improved dramatically.

While I'm on the subject, I heard from a smart person that cache lines do not necessarily align themselves on the corresponding memory boundaries. I.e. assuming a 64-byte line size and a cache miss, a memory access to address 0 would load a cache line with 0-63. However, if that access were instead to address 32, it would load a cache line with 32-95.

However, I have not been able to verify this experimentally. I've tried on X86, SparcV9, and Itanium, and they all appear to align the caches to a multiple of the cache line size. I.e. an access to address 32 would load 0-63.

If anybody has clarification here, I would appreciate a note. Thanks.

UPDATE: another colleague of mine has stated categorically that order of access will NOT affect the mapping of cache lines to memory addresses. That matches my experimental evidence.

Thursday, May 8, 2014

Blank is beautiful

We all know that we should comment our code. But it's hard. We know that obvious comments are pointless, and can even reduce readability by clutter:

i++; /* increment i */

You are reasonable to assume that the reader knows the programming language, so don't include those kinds of obvious comments. But while writing code, you can be a poor judge of what is obvious; it's ALL obvious to you!

But there is one kind of comment that the coder *is* the best judge for: blank lines.

Writing code is a bit like writing prose. You split your writing into paragraphs, where each paragraph represents a single coherent unit of thought. Consider this fragment of code adapted from a heap sort algorithm:

while (end > 0) {
temp = a[end];
a[end] = a[0];
a[0] = temp;
end--;
sift_down(a, 0, end);
}

I see four "units" of thought here. The first is that we are going to loop until the size of the heap becomes zero. The second is that we are swapping the head with last leaf of the heap. The third is that we are removing the last leaf from the heap, and the fourth is sifting down the new top of heap to its proper location. Four units of thought should be four paragraphs. But this code doesn't contain any paragraph breaks. Maybe this example isn't severe, but I've seen screen after screen of code packed together. It's hard to read because it's hard to *find* the units.

I think this is a big improvement (I didn't include a blank line after the "while" because the indention itself provides a visual break.):

while (end > 0) {
temp = a[end];
a[end] = a[0];
a[0] = temp;

end--;

sift_down(a, 0, end);
}

The author may not be a good judge of obvious v.s. non-obvious, but at least he knows which groups of lines represent individual units of thought.

So, maybe we're agreed on blank lines. What about textual comments? I've seen many people who usually add a comment to each paragraph. For example:

/* loop until the size of the heap becomes zero */
while (end > 0) {
/* swap head of heap with last leaf */
temp = a[end];
a[end] = a[0];
a[0] = temp;

end--; /* Remove that last leaf (the old head) */

/* re-balance the heap */
sift_down(a, 0, end);
}

Better? Or are the comments unnecessary because the code itself is obvious? I think "temp=a; a=b; b=temp;" is a common-enough idiom that calling it swap may be unnecessary. But is it obvious that "a[0]" and "a[end]" represent the head and last leaf nodes of the heap? Different people may have different opinions. And what about the third comment? I think everybody would agree that "end--; /* decrement end */" would be absurd. But saying that it removes the last leaf from the heap might be helpful to those unfamiliar with the standard array implementation of an ordered heap. And the fourth comment? Would the average programmer know what the "sift_down()" operation is for an ordered heap? (You DO all know what an ordered heap is, right?)

I have mixed feelings. As a maintainer, I usually view comments with suspicion since they are often not updated as the code is modified. But as a coder I try (and often fail) to add comments when the "what" is obvious, but the "why" may not be. We know what "end--;" does: it decrements "end". But why are we decrementing it? Because we are removing the last leaf.

But the "why" question can be asked recursively. Why are we removing the last leaf? When do we stop asking "why"? Here's where judgment comes in, and as I said, the coder is often the worst person to make that judgment call. Which is why 99% of code is poorly-commented!

I think the value of these particular textual comments is debatable. The value of the blank lines is not. In my opinion, the blank lines are the most important comments in the code fragment.

One more point about code paragraphs and their conceptual units of thought: what about making the source one line shorter by moving the "end--" higher-up?

/* loop until the size of the heap becomes zero */
while (end > 0) {
/* swap head of heap with last leaf, and remove it */
temp = a[end];
a[end--] = a[0];
a[0] = temp;

/* Re-balance the heap */
sift_down(a, 0, end);
}

I feel this is bad. Yes, the code is the same - the post-decrement modifies the right array element. But this is mixing two conceptual units of thought into one paragraph. Swapping head and last leaf, followed by removing the last leaf is NOT a single unit of thought; it's two. If your command has the word "and" in it, please consider breaking the paragraph up. But even worse is that the order of operatioons described in the comment is not embodied by the code. The "remove it" is second step, but the code for it is right smack in the middle of the first step. (Don't worry, modern compilers won't generate less-efficient code if you make end--; a separate line.)

If you really want to compress this code vertically, I would combine the swap onto one source line:

temp = a[end]; a[end] = a[0]; a[0] = temp;

While I'm at it, PLEASE comment the end brace of functions with the function's name!

int heap_sort(...)
{
...
} /* heap_sort */

And please add TWO blank lines between the end of one function the start of the next.

Monday, April 28, 2014

Cute "sed" Trick: rotate file

I wanted a script that would invoke a process, passing in a port number from a circular pool of ports. When the process exits and restarts, I want the script to pass in the *next* port from the pool.

PORT=`head -1 port_file.txt`

sed -i -e '1h;1d;$G' port_file.txt

In the above "sed" command:

The "-i" causes the file "port_file.txt" to be edited in-place.
The "1h" sed command yanks the first line into the "hold" space.
The "1d" sed command deletes the first line (prevents it from being output).
The "$G" command appends the hold space after the last line of the file.

Thus, given that "port_file.txt" contains:

12000

12001

12002

12003

the above two commands will leave the file with:

12001

12002

12003

12000

and "PORT" set to 12000.

Monday, April 21, 2014

Old C Coding Habits Die Hard

Old habits die hard.

In their 1978 book "The C Programming Language", Brian Kernighan and Dennis Ritchie described a version of C which has since become known as "K&R" C. For those of you who aren't older than dirt, K&R C differed in many ways from modern C. For example:

foo(c, v)
int c;
char **v;
{

Whoa! What's that? Original K&R C didn't let you declare the formal parameter types in-line with the function definition, you had to declare them between the ")" and the "{". Also, functions defaulted to being of type "int".

Lots of fossilized programmers like me were forced into habits in the old days which have outlived their need. Now, a lot of younger programmers getting out of school have as their first real-world experience the job of maintaining that old code, learning those same obsolete habits through osmosis.

LOCAL VARIABLES

In K&R C, local variables *had* to be declared at the top of the function. With C89, locals could be declared at the start of any compound statement (i.e. after any "{"). Finally, as of C99 locals could be declared basically anywhere.

In my opinion, it makes sense for a variable to be declared and initialized immediately before (or very close to) its first usage. Doing this conveys valuable information for a future code maintainer: this variable isn't used prior to this line. I can't tell you how often I spent time doing reverse searches for a variable to see where else it is being used.

Here's some "old habits" code:

  foo_find(...)
  {
    int found = 0;
      ...tens of lines...
    while (! found) {

With this, you might go straight to the top of "foo()" to see how "found" is declared and initialized, but then you still have to search the tens of intervening lines to see how "found" might changed.

I prefer this:

  foo_find(...)
  {
      ...tens of lines...
    int found = 0;
    while (! found) {

Sure, there might be a 20-year-old compiler which can't handle it, but anybody using a 20-year-old compiler has bigger problems than this.

GLOBAL VARIABLES

Another habit which I think might be a K&Rism: declaration of global variables at the top of the file. I might be wrong, but I suspect that K&R C disallows global variables to be declared after some functions are defined. As with local variables, this restriction is no longer in force. So, if you have a group of functions which implement an abstraction that uses globals, but the abstraction is not significant enough to justify putting them in their own file, I think it makes sense to put the globals associated with the abstraction in front of the first implementation function. For example:

  #define FOO_MAX_NUM 67
  typedef foo_t struct {...};
  foo_t foo_storage[FOO_MAX_NUM];
  static int foo_num_stored = 0;

  foo_t *foo_create(...)
  {

This code, including the #define, the typedef, and the global variables, may well be positioned after other functions are already defined. Once again, it helps a code reader know that the definitions are not used in the preceding code.

INCLUDE FILES

In the above example, "FOO_MAX_NUM" and "foo_t" are defined close to the code, instead of at the top of the file. But many programmers wouldn't even put them at the top of the file, they would put them in an include file "foo.h".

I don't think this is a K&Rism. It's just a habit to put all typedefs and #defines in the include file. But again, I advocate for keeping things as localized as possible. Definitions and declarations which are internal implementation details to an abstraction should be hidden from general view. C++ may do a better job of managing the external and internal details of an abstraction, but the guidelines should be followed in C as well.

Wednesday, April 16, 2014

strtol preferred over atoi

We've all done it: parsed command-line parameters and converting numeric strings to integers using "atoi()". And (hopefully) we've all felt guilty about it, because "atoi()" sucks.

Let's say I have a program, "blunjo", which takes a "-d" option with numeric debug level (0=no debug, 1=a little debug, 2=a lot of debug). So I might use it like this:

blunjo -d 1 input_file1 input_file2 ...

Inside the code I probably call "getopt()" and include the code fragment:

case 'd':

debug_opt = atoi(optarg);

break;

So, what happens if the user forgets exactly how to use it and enters this:

blunjo -d input_file1 input_file2 ...

In this case, "optarg" points at "input_file1", which "atoi()" happily converts to zero, turns off debug, and silently skips processing "input_file1". Might be nice if it actually told the user that "input_file1" is an invalid integer and printed the usage string.

Enter "strtol()". It's a little more complicated to use (also more flexible):

long int strtol(const char *nptr, char **endptr, int base);

The Linux man page contains two interesting bits:

If endptr is not NULL, strtol() stores the address of the first invalid character in *endptr. If there were no digits at all, strtol() stores the original value of nptr in *endptr (and returns 0). In particular, if *nptr is not '\0' but **endptr is '\0' on return, the entire string is valid.

... the calling program should set errno to 0 before the call, and then determine if an error occurred by checking whether errno has a non-zero value after the call.

So this is better code:

case 'd':

char *p = NULL;

errno = 0;

debug_opt = strtol(optarg, &p, 10);

if (errno != 0 || p == optarg || p == NULL || *p != '\0') {

usage("Invalid numeric value for -d option"); }

Note that you must make sure that "optarg" is non-null before calling strtol. If you use getopt and specify "d:" then getopt will guarantee a non-null "optarg". But if you are parsing the command-line string yourself, beware of the user entering "-d" with nothing at all following it - the next "argv[]" will be null. Also note that the "p==NULL" check is technically not necessary; so long as "optarg" is non-null, "p" will never be left at null. However, given that I'm not responsible for the code that sets "p", it just seems like good practice to include the sanity check before dereferencing it.

Here's a macro to make it all even easier, handle 0x-prefixed hexidecimal, and even prints a programmer-friendly error specifying the file:line of the call to it:

#define SAFE_ATOL(a,l) do { \

char *in_a = a; char *temp = NULL; long result; errno = 0; \

if (*in_a == '0' && *(in_a+1) == 'x') \

result = strtol(in_a+2, &temp, 16); \

else \

result = strtol(in_a, &temp, 10); \

if (errno!=0 || temp==in_a || temp==NULL || *temp!='\0') { \

fprintf(stderr, "%s:%d, Error, invalid numeric value for %s: '%s'\n", \

__FILE__, __LINE__, #l, in_a); \

exit(1); \

} \

l = result; /* "return" value of macro */ \

} while (0)

Here's a usage of the macro:

case 'd':

SAFE_ATOL(optarg, debug_opt);

Note that on errors, it abruptly exits the program.

Finally, there is also "strtoll()" which returns a long long int, and has the same error-checking. The functions "strtoul()" and "strtoull()" are similar but for unsigned.

EDIT: I'm pleased to discover that the function "inet_pton()" does a good job of error checking a dotted-decimal IP address. For example, adding garbage to the end of a valid IP address is flagged as an error.

EDIT2: I've enhanced the above macro in a few ways and put it on my github. See: https://github.com/fordsfords/safe_atoi

Monday, April 14, 2014

Password strength

I disagree with a lot of "password strength" measures. Most measures want you to include upper and lower case, digits, special characters, etc. I don't feel they are necessary, and don't give you as much "security" as you might think (like substituting zero for the letter "o").

Then along came Randall Munroe with an XKCD cartoon which does a much better job of explaining it than I ever could:

Most of the password strength "meters" that you see on sites are based on the idea that digits, special characters, and mixed-case are the magic elixir for strong passwords. I was quite dismayed to discover that most of them consider "P@ssw0rd" to be very secure, which is absurd. Then I found zxcvbn:

https://dl.dropboxusercontent.com/u/209/zxcvbn/test/index.html

Finally, a password strength meter which knows that "P@ssw0rd" is low security (score=0 of 4, crack time 0 seconds)! Whereas "correcthorsebatterystaple" is very secure (score=4 of 4, crack time 65 years).

Another password method that I've heard hyped which I disagree with is the haystack approach. According to this author, the password "D0g....................." (21 dots) is very strong. This is ONLY true for brute-force password cracking, which is NOT how serious crackers work. They do dictionary and repeated character analysis. According to zxcvbn, "D0g....................." is weak (score=0, crack time 84 seconds). YES!

One fly in all this ointment: many systems limit your password length, sometimes to as few as 8 characters. This makes it very hard to use 4 random words, meaning that you probably need to go the random route. For the 8 random character password "0ZhyUQ63", zxcvbn rates it 4 of 4, with centuries required to crack it. Whereas "saytroll" is weak, with 22-second crack time. (Note that "S@yTr0ll" is still weak, with a 7-minute crack time - so much for magic elixirs.)

BTW, my wiki has a somewhat longer article on password strength.

The Meditative Coder

Tuesday, June 24, 2014

My work email inbox is empty

Monday, June 16, 2014

Maglus Stylus

Thursday, June 12, 2014

Order in Structure Assignment Statements

Tuesday, June 10, 2014

Error handling: the enemy of readability?

Saturday, June 7, 2014

Grace Hopper documentary (crowd funding)

Thursday, June 5, 2014

clang/llvm optimize -O3 understands free()

Tuesday, June 3, 2014

Easy Familiarity

Monday, May 26, 2014

Memorial Day

Monday, May 19, 2014

Fast, lock-free, non-blocking queue

Sunday, May 18, 2014

Cache Line Size and Alignment

Thursday, May 8, 2014

Blank is beautiful

Monday, April 28, 2014

Cute "sed" Trick: rotate file

Monday, April 21, 2014

Old C Coding Habits Die Hard

Wednesday, April 16, 2014

strtol preferred over atoi

Monday, April 14, 2014

Password strength

About Me

Tags (see here)

Blog Archive