Monday, May 26, 2014

Memorial Day

I'll admit it: I'm inconsistent and possibly even hypocritical about the military.  As a young pup, I toyed with pacifism, but couldn't figure out how to be consistent.  My later teenage years held some fear of being drafted into the Vietnam war.  I've always been uncomfortable with nationalism and jingoism, both of which give patriotism a bad name.

And yet...

Soldiers throughout history have gone to war with the goal of preventing harm to the ones they love and the societies they live in.  I believe that very few soldiers fight out of a sense of pure hatred.  These men and women really have offered their lives up in the service of others.  And while some soldiers have performed terrible acts, most perform their duties to the best of their abilities.

So each memorial day, I wrestle with these conflicting feelings.  But one thing I never feel conflicted about is a deep sense of respect I feel to the men and women who have put their lives in danger, and sorrow at those who have died trying to do the right thing.

I feel this way for soldiers on all sides of every conflict.  As a Jew, I have no respect for Nazism.  But I do feel respect for German soldiers who were fed lies about Jews and made to fight, and I feel sorrow for German soldiers killed.  I'll never forgive the Nazi leaders who knew they were spreading lies; I completely forgive the foot soldiers who thought they were defending their families and society.

So please pause for a moment today and pay your respects to the soldiers of every country throughout history who have given their lives trying to do the right thing.

Monday, May 19, 2014

Fast, lock-free, non-blocking queue

In my copious spare time (insert laugh-track here), I have developed a fairly simple queue that features:
  • Fixed size (size specified at queue creation time).
  • Non-blocking (enqueuing to a full queue returns immediately with status; dequeuing from an empty queue returns immediately with status).
  • Single Producer, Single Consumer (SPSC).
  • No dynamic memory allocates or frees during enqueue and dequeue operations.  Messages are void pointers (null pointers not allowed).
  • High performance.  On my Macbook Air, streaming data through a queue averages 11 ns per message (queue size=32), while ping-pong latency is 69 ns (one-way).
  • Tested on Mac OSX 10.9 (Mavericks) and Linux 2.6 and 3.5 kernels.  At present, I only recommend the 64-bit x86 processor family, due to the fact that I take advantage of its programmer-friendly memory model.  In the future I hope to generalize it to be efficient on other processor types.
I'm making it available as public domain (or more-accurately, as CC0), which means you can use it for any purpose without restriction.

You can get the module and its documentation here: https://github.com/fordsfords/q/tree/gh-pages

In addition to the queue itself, I produced semi-literate internals documentation  I'm rather proud of my semi-literate documentation system.

Sunday, May 18, 2014

Cache Line Size and Alignment

I've been doing some fiddling around with a high-performance queue, and have noticed something.  There is a wide-spread belief that CPU caches have a line size of 64 bytes.

Alas, all the world is not X86.

Modern PowerPC cache line size is 128 bytes.  Same with Itanium-2.  IBM's system Z CPUs have a line size of 256 bytes.

I saw the results of this personally with our Itanium Linux machine.  My queue performed VERY poorly.  I increased the padding between critical fields to assume a 128-byte cache line, and magically the performance improved dramatically.

While I'm on the subject, I heard from a smart person that cache lines do not necessarily align themselves on the corresponding memory boundaries.  I.e. assuming a 64-byte line size and a cache miss, a memory access to address 0 would load a cache line with 0-63.  However, if that access were instead to address 32, it would load a cache line with 32-95.

However, I have not been able to verify this experimentally.  I've tried on X86, SparcV9, and Itanium, and they all appear to align the caches to a multiple of the cache line size.  I.e. an access to address 32 would load 0-63.

If anybody has clarification here, I would appreciate a note.  Thanks.


UPDATE: another colleague of mine has stated categorically that order of access will NOT affect the mapping of cache lines to memory addresses.  That matches my experimental evidence.

Thursday, May 8, 2014

Blank is beautiful

We all  know that we should comment our code.  But it's hard.  We know that obvious comments are pointless, and can even reduce readability by clutter:

    i++;  /* increment i */

You are reasonable to assume that the reader knows the programming language, so don't include those kinds of obvious comments.  But while writing code, you can be a poor judge of what is obvious; it's ALL obvious to you!

But there is one kind of comment that the coder *is* the best judge for: blank lines.

Writing code is a bit like writing prose.  You split your writing into paragraphs, where each paragraph represents a single coherent unit of thought.  Consider this fragment of code adapted from a heap sort algorithm:

    while (end > 0) {
        temp = a[end];
        a[end] = a[0];
        a[0] = temp;
        end--;
        sift_down(a, 0, end);
    }

I see four "units" of thought here.  The first is that we are going to loop until the size of the heap becomes zero.  The second is that we are swapping the head with last leaf of the heap.  The third is that we are removing the last leaf from the heap, and the fourth is sifting down the new top of heap to its proper location.  Four units of thought should be four paragraphs.  But this code doesn't contain any paragraph breaks.  Maybe this example isn't severe, but I've seen screen after screen of code packed together.  It's hard to read because it's hard to *find* the units.

I think this is a big improvement (I didn't include a blank line after the "while" because the indention itself provides a visual break.):

    while (end > 0) {
        temp = a[end];
        a[end] = a[0];
        a[0] = temp;

        end--;

        sift_down(a, 0, end);
    }

The author may not be a good judge of obvious v.s. non-obvious, but at least he knows which groups of lines represent individual units of thought.

So, maybe we're agreed on blank lines.  What about textual comments?  I've seen many people who usually add a comment to each paragraph.  For example:

    /* loop until the size of the heap becomes zero */
    while (end > 0) {
        /* swap head of heap with last leaf */
        temp = a[end];
        a[end] = a[0];
        a[0] = temp;

        end--;  /* Remove that last leaf (the old head) */

        /* re-balance the heap */
        sift_down(a, 0, end);
    }

Better?  Or are the comments unnecessary because the code itself is obvious?  I think "temp=a; a=b; b=temp;" is a common-enough idiom that calling it swap may be unnecessary.  But is it obvious that "a[0]" and "a[end]" represent the head and last leaf nodes of the heap?  Different people may have different opinions.  And what about the third comment?  I think everybody would agree that "end--;  /* decrement end */" would be absurd.  But saying that it removes the last leaf from the heap might be helpful to those unfamiliar with the standard array implementation of an ordered heap.  And the fourth comment?  Would the average programmer know what the "sift_down()" operation is for an ordered heap?  (You DO all know what an ordered heap is, right?)

I have mixed feelings.  As a maintainer, I usually view comments with suspicion since they are often not updated as the code is modified.  But as a coder I try (and often fail) to add comments when the "what" is obvious, but the "why" may not be.  We know what "end--;" does: it decrements "end".  But why are we decrementing it?  Because we are removing the last leaf.

But the "why" question can be asked recursively.  Why are we removing the last leaf?  When do we stop asking "why"?  Here's where judgment comes in, and as I said, the coder is often the worst person to make that judgment call.  Which is why 99% of code is poorly-commented!

I think the value of these particular textual comments is debatable.  The value of the blank lines is not.  In my opinion, the blank lines are the most important comments in the code fragment.


One more point about code paragraphs and their conceptual units of thought: what about making the source one line shorter by moving the "end--" higher-up?

    /* loop until the size of the heap becomes zero */
    while (end > 0) {
        /* swap head of heap with last leaf, and remove it */
        temp = a[end];
        a[end--] = a[0];
        a[0] = temp;

        /* Re-balance the heap */
        sift_down(a, 0, end);
    }

I feel this is bad.  Yes, the code is the same - the post-decrement modifies the right array element.  But this is mixing two conceptual units of thought into one paragraph.  Swapping head and last leaf, followed by removing the last leaf is NOT a single unit of thought; it's two.  If your command has the word "and" in it, please consider breaking the paragraph up.  But even worse is that the order of operatioons described in the comment is not embodied by the code.  The "remove it" is second step, but the code for it is right smack in the middle of the first step.  (Don't worry, modern compilers won't generate less-efficient code if you make end--; a separate line.)

If you really want to compress this code vertically, I would combine the swap onto one source line:

        temp = a[end];  a[end] = a[0];  a[0] = temp;


While I'm at it, PLEASE comment the end brace of functions with the function's name!

    int heap_sort(...)
    {
        ...
    }  /* heap_sort */


And please add TWO blank lines between the end of one function the start of the next.