Tuesday, June 24, 2014

My work email inbox is empty

Wow.  I think the last time my work in-box was empty was when I joined 29West, back in 2005.  It stayed empty for probably an hour.

I dunno ... it's actually a little scary looking.

Monday, June 16, 2014

Maglus Stylus

I am very happy with the Maglus stylus for my iPad.  I wrote an Amazon review of why I like it so much (there are specific reasons).

The reason for this post is that you can get a 10% discount if you use this link.  It's part of an "affiliate" program that also kicks me a bit of money, which I have re-directed to the American Red Cross.

That said, I did notice that on Amazon, I could find it for even less than the 10% off list, so if you're looking to save money, shop around.  I would rather see you save as much money as you can, and independently donate a decent amount to the Red Cross because you want to support their good work.

Anyway, I never feel 100% comfortable promoting a commercial product lest I come across as a corporate shill.  Then I saw a post on their corporate blog on James Joyce and Ulysses.  Somehow that makes me feel less guilty being a booster.  Which is really silly since I've never read Ulysses!

Hmm ... what tags should I set on this post?  I don't see one for "shameless commerce".  I guess "tips" since I really do like the stylus a lot.

Thursday, June 12, 2014

Order in Structure Assignment Statements

I had what I thought was a pretty simple piece of code, transferring a structure from one thread to another.  Without describing it blow-by-blow, I'll present it thus: a writer thread is measuring temperature and pressure of a steam turbine.  A reader periodically samples the measurement.  There is no need for the reader to get every measurement, it only needs the most-recent at the point that it samples.


DESIGN!

A global data structure:

struct measurement_s {
    volatile uint64_t post_seq;  /* sequence number of measurement */
    volatile  int64_t temperature;
    volatile  int64_t pressure;
    volatile uint64_t pre_seq;   /* sequence number of measurement */
} live;

The algorithm is for the writer to first increment pre_seq, then update the temperature and pressure, then increment the post_seq.  The reader reads the post_seq first, then temperature and pressure, then pre_seq.  If the reader sees pre_seq != post_seq, it means that a collision has happened, and the reader should re-try.  Here's the reader code:

        struct measurement_s sample;
        do {
            sample = live;
        } while (sample.pre_seq != sample.post_seq);
        /* now have valid sample */


BROKEN!

Pretty simple.  Easy to understand.  And it didn't work right.  The test code was written to make it obvious when inconsistent values are read, and the code above generated inconsistent reads.  A LOT of inconsistent reads.  How can this be???  I used volatile to prevent the optimizer from messing with the order of reads and writes, and it's an x86-64, which has a very predictable memory model, removing the need for most hardware memory barriers.  The algorithm is correct!  Why is it failing?


DIAGNOSE!

So I did what I've been doing a surprising amount recently: I generated assem language (this time withOUT optimization turned on).  Here's the structure assignment:

    leaq    -64(%rbp), %rax
    movq    _live+24(%rip), %rcx
    movq    %rcx, 24(%rax)
    movq    _live+16(%rip), %rcx
    movq    %rcx, 16(%rax)
    movq    _live+8(%rip), %rcx
    movq    %rcx, 8(%rax)
    movq    _live(%rip), %rcx
    movq    %rcx, (%rax)

HEY!  It copied the structure in the REVERSE ORDER than I expected!  So it copies the pre_seq first, then the temperature and pressure, and post_seq last!  No wonder it detected collisions, that order is NOT thread-safe!


HAPPILY EVER AFTER!

So I replaced the structure assign with explicit copies of the fields:

        struct measurement_s sample;
        do {
            /* Don't do "sample = live;", order of field copies is not defined. */
            sample.post_seq = live.post_seq;
            sample.temperature = live.temperature;
            sample.pressure = live.pressure;
            sample.pre_seq = live.pre_seq;
        } while (sample.pre_seq != sample.post_seq);

YAY!  It works.  (Whew!)

It would also be possible to define a sub-structure containing temperature, pressure, and any additional interesting data.  The fields in the sub-structure could be copied as struct assign, allowing the compiler to do it in any order.  But the pre_seq and post_seq must be individually copied in the right order.


DIGRESSIONS!

I do love those exclamation points!

So, the above code works.  How *good* is the design?  For physical process measurement, probably just fine.  It would be rare to measure physical characteristics more often than a few times per second, so the chances of the reader having to re-try the sample are very low.  (On the other hand, why not use mutexes if you are going that slow?)

But suppose this algorithm is applied to market data, which might have many thousands of ticks per second?  The algorithm shown suffers from a fundamental flaw: starvation.  If a writer and reader get pathologically synchronized, there is no bound to the number of times the reader will cycle, trying to get a clean sample.

In practice, it might be fine.  But whenever possible, I prefer a solution that works by design, not by circumstance.

I think a better approach might be to use a non-blocking queue, like this one.  The writer presumably requires non-zero work to produce a new record.  The consumer should read the queue dry in a tight loop before processing the last record read.  Now the queue need only be long enough to hold the maximum number of produced records between two reads.  A slow reader will simply clean out the queue each time it reads.


Some might question the wisdom of doing ANY kind of programming where the order of field copies in a struct assign is important.  In the old days, this program would have been done with a critical section, protected by a mutex or semaphore.  That works fine, and doesn't depend on order.  But high-performance applications of today need much lower latencies.  So lock-free algorithms are important.  With lock-free, you need to concern yourself with order; the compiler can change the order of the assem language operations from the source code, and the hardware can change the order of physical memory operations from the assem code.  Compile optimizer barriers address the former; hardware memory barriers address the latter.


Finally, some will question my choice of making all the fields of the structure volatile.  I admit it is a heavy-handed approach that I used for expediency -- this was a prototype after all.  For production code I would use a more-precisely targeted VOL macro (mentioned in an earlier post), often in cooperation with a general compile optimizer barrier, like "asm volatile ("" ::: "memory");".  Note that these techniques are NOT used to squeeze every nanosecond out of the code.  They are used to insure code correctness.

Tuesday, June 10, 2014

Error handling: the enemy of readability?

I've been a programmer since the mid-1970s, call it 40 years.  During that time I've seen the rise into popularity: structured programming and object-oriented programming.  (More-recently, I've seen increased interest in functional programming, and I've started learning a bit about it myself, but I'm not conversant-enough in it to write intelligently about it.  So I'll omit it from this discussion.)

After all these years, I find it very interesting that I haven't seen a common approach to internal error handling become widely-accepted, the way structure and object-orientation have.

I'm speaking here of internal errors, not user errors.  If the user enters bad input, you owe it to the user to detect the problem and report it in a user-friendly way, allowing the user to correct his mistake.  User error detection and handling code is fully part of the program.

Internal errors are different.  These are found by inserting sanity checks to see if something that "should never happen" might have actually happened.  Very often, failed sanity checks represent a software bug that needs to be tracked down and fixed.


ROBUST CODE vs. READABLE CODE?

Most approaches to internal error handling I've seen are poor.  Maybe it is partly due to a lack of programmer discipline - we all get lazy sometimes and forget to check a return status - but I don't think that is the primary reason.  I believe the main culprit is the need for readable code.

The greatest expense in any significant programming effort is not the original writing of code, it's the subsequent maintenance.  The original programmers eventually leave the project, and less-experienced programmers often take over the less-glamorous job of fixing bugs and adding enhancements.  It is critically important that the code be readable.  External documentation is typically out of date moments after it is written, and even internal documentation (comments) are suspect.  The code itself is the only reality that matters, and the harder it is to understand that code, the harder (and more expensive) it will be to maintain and improve it.

And unfortunately, doing a thorough job of checking and handling internal errors often makes the code harder to read.

C

Which is easier to read and maintain?  This:

example 1:
  foo_t *foo_create()
  {
      foo_t *new_foo = malloc(sizeof(foo_t));

      new_foo->state_filep = fopen("foo.state", "rw");
      new_foo->data_filep = fopen("foo.data", "rw");

      new_foo->reader = reader_create();
      new_foo->writer = writer_create();

      return new_foo;
  }  /* foo_create */

or this:

example 2:
  foo_t *foo_create()
  {
      foo_t *new_foo = malloc(sizeof(foo_t));
      if (new_foo != NULL) {  /* malloc successful? */

          new_foo->state_filep = fopen("foo.state", "rw");
          if (new_foo->state_filep != NULL) {  /* fopen successful? */
              new_foo->data_filep = fopen("foo.data", "rw");
              if (new_foo->data_filep != NULL) {  /* fopen successful? */

                  New_foo->reader = reader_create();
                  if (new_foo->reader != NULL) {  /* reader successful? */
                      new_foo->writer = writer_create();
                      if (new_foo->writer != NULL) {  /* writer successful? */

                          return new_foo;
                      }
                      else {
                          log_error("Could not create writer");
                          reader_delete(new_foo->reader);
                          fclose(new_foo->data_filep);
                          fclose(new_foo->state_filep);
                          free(new_foo);
                          return NULL;
                      }
                  }
                  else {
                      log_error("Could not create reader");
                      fclose(new_foo->data_filep);
                      fclose(new_foo->state_filep);
                      free(new_foo);
                      return NULL;
                  }
              }
              else {
                  log_error("Could not open data file");
                  fclose(new_foo->state_filep);
                  free(new_foo);
                  return NULL;
              }
          }
          else {
              log_error("Could not open state file");
              free(new_foo);
              return NULL;
          }
      }
      else {
          log_error("Could not malloc foo");
          return NULL;
      }
  }  /* foo_create */

Most projects I've worked on use a hybrid of the two approaches, but most lean heavily towards one or the other.  The first is easy to read, but brittle; if anything goes wrong, the problem often doesn't generate any visible symptoms till much later, making it a nightmare to diagnose.  The second detects errors early and does proper reporting and cleanup, but is a nightmare to read and maintain.  (All that duplicated code in the else clauses is sure to get out of sync at some point, resulting in resource leaks or uninitialized accesses, neither of which will be detected unless some error actually happens and the error handling code finally gets exercised.)

Wouldn't it be nice to have both good error detection and readability?

example 3:
  foo_t *foo_create()
  {
      foo_t *new_foo = NULL;
      new_foo = malloc(sizeof(foo_t));  ASSRT(new_foo!=NULL);
      memset(new_foo, 0, sizeof(new_foo));

      new_foo->state_filep = fopen("foo.state", "rw");  ASSRT(new_foo->state_filep!=NULL);
      new_foo->data_filep = fopen("foo.data", "rw");  ASSRT(new_foo->data_filep!=NULL);

      new_foo->reader = reader_create();  ASSRT(new_foo->reader!=NULL);
      new_foo->writer = writer_create();  ASSRT(new_foo->writer!=NULL);

      return new_foo;

  ASSRT_FAIL:
      if (new_foo != NULL) {
          if (new_foo->writer != NULL) reader_delete(new_foo0->writer);
          if (new_foo->reader != NULL) reader_delete(new_foo0->reader);
          if (new_foo->data_filep != NULL) fclose(new_foo0->data_filep);
          if (new_foo->state_filep != NULL) fclose(new_foo0->state_filep);
          memset(new_foo, 0, sizeof(new_foo));  /* poison the freed structure */
          asm volatile ("" ::: "memory");       /* See note [1] */
          free(new_foo);
      }

      return NULL;
  }  /* foo_create */

Note [1]: See http://blog.geeky-boy.com/2014/06/clangllvm-optimize-o3-understands-free.html

My philosophy is to catch errors as early as possible by including sanity checks, but doing it unobtrusively.  Make it easy to ignore if you're looking for the code's basic algorithm, but easy to understand if you are analyzing the error handling.  I think example 3 to be approximately as readable as the first example, and as robust as the second.  It is important to keep the error handling code visually small so that it can be appended to the ends of lines.  This:

      foo->state_filep = fopen("foo.state", "rw");  ASSRT(foo->state_filep!=NULL);

is much cleaner than this:

      foo->state_filep = fopen("foo.state", "rw");
      if (foo->state_filep) {
          blah blah blah
      }

As you've probably guessed, the ASSRT macro does nothing if the expression is true, or does a "goto ASSRT_FAIL;" if the expression is false.  A variation is to use "abort()" instead of goto.  In many ways this is FAR superior for a troubleshooter; a core file can provide a full stack trace and access to program state.  Calling "abort()" also simplifies the code further, removing explicit clean up.  But it also means that the code has given up on any hope of recovery.  Differentiating between an un-recoverable error and a recoverable error is often not possible at the point where the error is detected; it is usually considered good form to report the error and return a bad status.

(FYI - the "memset()" before the "free()" is commented as "poison the freed structure".  The purpose of this is to make sure that any dangling references to the foo object will seg fault the instant they try to de-reference one of the internal pointers.  Again, my philosophy is to find bugs as soon as possible.)


ASSRT()

What is this magical "ASSRT()" macro?

  #define ASSRT(cond_expr) do { \
      if (!(cond_expr)) { \
          char errmsg[256]; \
          snprintf(errmsg, sizeof(errmsg), "%s:%d, failed assert: (%s)", \
              __FILE__, __LINE__, #cond_expr); \
          log_error(errmsg); \
          goto ASSRT_FAIL; \
      }  \
  } while (0)

For those unfamiliar with C macro magic, there are two bits here that might benefit from explanation.  First the "do ... while (0)".  This is explained pretty well here.  It's the standard way to avoid tricky bugs when macros are used as part of if statements.  The second is the "#cond_expr" at the end of the sprintf.  The C preprocessor converts "#macro input parameter" into a C string.

So, the macro expansion of "new_foo = malloc(sizeof(foo_t));  ASSRT(new_foo!=NULL);" is:

      new_foo = malloc(sizeof(foo_t));  do {
          if (!(new_foo!=NULL)) {
              char errmsg[256];
              snprintf(errmsg, sizeof(errmsg), "%s:%d, condition false: '%s'",
                  "foo.c", 50, "new_foo!=NULL");
              log_error(errmsg);
              goto ASSRT_FAIL;
          }
      } while (0);

An important reason for making this a macro instead of a function is so that the __FILE__ and __LINE__ compiler built-ins apply to the source line which uses ASSRT.  I've also sometimes enhanced the ASSRT macro, making it specific to a program or even a function by including prints of interesting state information.  I've also sometimes included errno.  These improvements help the troubleshooter diagnose the internal problem.

One big drawback to this approach?  The error messages logged can only be loved by a programmer. What would an end user do with this?

  Error logger: foo.c:50, failed assert: (new_foo!=NULL)

This is wonderful for a programmer; I don't know how many times I've seen an error log containing, "could not open file", only to find that the string appears many times in the source code.  This pin-points the source file and line.  However, this message is virtually useless to an end user.

On the other hand, is it really any worse than this?

  Error logger: could not malloc foo

This looks less like geek-speak, but doesn't actually help an end user any more.  In both cases, the user has little choice but to complain to the developer.

My point is that including an error-specific, highly-descriptive message makes the error handling code longer and more-intrusive.  One goal of mine is to keep the ASSRT on the same line as the line it's checking; hard to do if including a descriptive message.  But as soon as it overflows onto subsequent lines, it interferes with the logical flow of the code, making it harder to read.  You figure, since internal errors are usually the result of a software bug, it's not so bad to log errors that only a programmer could love - only programmers need to deal with them, users only need to accurately report them.  (Which reminds me of a war story.)

I have an evolved version of this approach, which I plan to clean up and present in the not-too-distant future.


JAVA

Java adds exception handling, which some would say contradicts my claim that error handling isn't standardized.  However, Java exceptions merely standardize the low-level language tool.  It doesn't really address the approach that a program takes.  When used poorly (e.g. try-catch at every call), it reduces to the unreadability of example 2.  When done well, it approximates the good readability of example 3.  The key is the sacrifice some user-friendliness of error message in favor of keeping the program structure clear.  That sacrifice is worth it since internal errors should rarely happen.


C++

I am a bit embarrassed to admit that I'm not very familiar with C++'s exception handling facility.  I've never worked on a project that used it.  I've heard many programmers state their strongly-held opinion that C++ exceptions are to be avoided at all costs, but I don't have enough personal experience to have my own opinion.  One concrete data point: a computer engineering senior at U of I (a respected institution) tells me that their C++ course does not teach exceptions.

But I figure that even if C++ exceptions are not up to the task, my C method should work fine.  I'm not married to any specific language tool, so long as error handling can be added without obscuring the code.


P.S.

I risk being haunted by the ghost of Dijkstra by using a dreaded goto.  Maybe Knuth will protect me (alternate).  :-)


P.P.S.

I have a slightly different version in my wiki.  I change the goto to abort(), and I also print the human-readable form of errno.  While fine for most Unix tools, those changes will not be appropriate for all kinds of programs, so the form above is more general.

Saturday, June 7, 2014

Grace Hopper documentary (crowd funding)

I suffer from a little bit of hero worship for Grace Hopper (1906 - 1992).  She is often credited with writing the first compiler in 1952, and was an important contributor to the development of COBOL.

There is a crowd-funded effort to produce a documentary on her:

http://www.indiegogo.com/projects/born-with-curiosity-the-grace-hopper-documentary

I would like to see this documentary succeed because I really would like to know more about her than appears in Wikipedia.

Update: Grace Hopper on Letterman:
    https://www.youtube.com/watch?v=1-vcErOPofQ
She really was charming.

Thursday, June 5, 2014

clang/llvm optimize -O3 understands free()


Wow, I just found an amazing bug, although I'm not "bug" is the right word for it.

I have an object delete function:

void e_del(e_t *e)
{
    if (e == NULL || e->signature != E_SIGNATURE) abort();
    e->signature = 0;
    free(e);
}

I have int signature at the end of the e_t structure, set at object creation to a known value (E_SIGNATURE).  Then, I can check periodically to make sure that a valid object is being used.  This detects a variety of bugs, like writing past the end of a buffer in the e_t structure, or just an invalid pointer.  The setting of signature to zero in e_del() is to detect a double call to e_del() with the same object.

In my unit tests, I coded a double-delete to make sure it worked, but abort didn't get called!  Instead, it double-freed!

Silly me, I must have a stupid error in my logic.  Re-build without optimization to make it easier to set breakpoints and single step.  Works.  Works fine.  Without optimization, my code works.  With optimize level 3, it doesn't.  I generated assembly language for optimize and non-optimize cases, and guess what.  At optimize 3, the zero of signature is gone.  Completely gone.

I had a theory, which I tested by changing free(e) to printf("%p", e) and guess what.  At optimize 3, the zero of signature reappeared.  The code worked as expected.  Change printf back to free, and the zero of signature disappeared again.

Apparently, clang/llvm knows what free() is for.  It knows that only a buggy program would try to access a heap segment after it is freed.  So for any non-buggy (and non-multi-threaded) program, it is safe to throw away the "e->signature = 0" line.   My problem is that I DON'T want to assume a bug-free program.  I want to do a good job of detecting bugs.

How to fix?  A variation on Linux's ACCESS_ONCE macro which I named VOL:

#define VOL(_x, _type) (*(volatile _type *)&(_x))
. . .
VOL(e->signature, int) = 0;
free(e);

This "volatilizes" access to e->signature, forcing the optimizer to leave the operation alone.  (FYI - my reason for making my own macro is that ACCESS_ONCE uses the gcc-specific built-in "typeof()" to remove the need to pass in the data type.  But I didn't want my code to be gcc-specific.)

I've done a lot of multi-threaded programming in my day, most-recently experimenting with lock-free, non-blocking designs, so I've had to worry about the optimizer from time to time.  But this is the first time I've ever had a simple, single-threaded, straight-line code work wrong due to the optimizer.  Think about what this means.  The optimizer knows what free() is for, knows that the freed segment shouldn't be accessed after the free, and makes code generation decisions on that basis.  I wonder what else the optimizer thinks it knows...

UPDATE: in its first posting, mistakenly attributed this issue to gcc.  I forgot that my Mac now uses the clang/llvm compiler.  I tried this with true gcc, and it did NOT omit the clear of signature.

UPDATE2: I changed the assignment of zero to signature to a memset of the whole structure:

void e_del(e_t *e)
{
    if (e == NULL || e->signature != E_SIGNATURE) abort();
    memset(e, 0, sizeof(*e));
    free(e);
}

The compiler did NOT optimize out the memset.  I'm guessing that it thought the memset function might have side effects, which is interesting since memset is a compiler builtin.  In fact, if signature is the only field in e_t, the memset compiles to the exact same instructions as the zero assignment.

UPDATE3: Wow, optimizers are surprising.  I found a case where the memset() IS optimized away.  I tried telling the compiler that e points at volatile memory, and it STILL optimized it away.   I'm back to using VOL() on the individual fields.  Again, this is all clang/llvm on Mac.  I've never seen gcc optimize away code based on knowing what free() does.

Tuesday, June 3, 2014

Easy Familiarity

I was at a software conference a couple weeks ago where I attended some sessions on functional programming.  I, like many imperative programmers, am finding functional a bit foreign and hard to internalize, and I enjoyed listening to some experienced practitioners talk about it.

In one session, the presenter said, "simple is not the same thing as easy."  "Simple" is a characteristic of a thing.  It is simple or it is complex.  A "hello world" program is simple; spaghetti code is complex.  "Easy" relates to an individual.  I find pointer-intensive software to be easy.  Garry Casparov finds chess easy.  The presenter's point was that we should strive to make simple software (complexity is the single greatest enemy to quality).  In his opinion, functional languages can make many programs more simple than in imperative languages.  This, in spite of the fact that many imperative programmers would find the functional program difficult to understand, due to lack of familiarity.

In another session, the presenter said, "readable != familiar".  Same idea, slightly different spin.  When I first learned C, I found conditional expressions to have low readability:
    int max_xy = (x > y) ? x : y;
Almost thirty years later, I don't give it a second thought.  Did it become magically more readable?  No, the construct was always readable.  I became more familiar.  On the other hand, I've had maybe 20 years experience with regular expressions, and they are still virtually a "write-only" language.  Regular expressions fundamentally have low readability.

simple != easy
readable != familiar

Both encourage us to not fear change, to not assume that if X was good enough for my grandfather, then it's good enough for me.  I find it strange that programmers would resist new ways of thinking, new approaches, new language features.  Our profession barely existed 50 years ago, it changes every year.  And yet, in my early years, I knew programmers who resisted C because you couldn't do things that you could do in assembly language.  A few years later, people were doubting that C++ could ever be usable in embedded systems.  (Now Java is being embedded, for goodness sake!)  Personal computer operating systems have progressed from simple program loaders (e.g. MSDOS) to full Unix with virtual memory and symmetric multiprocessor capability.

And still programmers resist the new, the different.  (I'm no better!  I'm still don't consider myself a good OO programmer.  I've done very little C++.)

One guy at the conference went so far as to say (approximately), "If you don't learn functional programming, in 5 years there will be a name for your job: 'maintenance.'"  I'm not sure that's true -- there are still a lot of new non-OO C code being written -- but it gives me pause ... and motivation to learn some more Haskell.