Thursday, May 8, 2014

Blank is beautiful

We all  know that we should comment our code.  But it's hard.  We know that obvious comments are pointless, and can even reduce readability by clutter:

    i++;  /* increment i */

You are reasonable to assume that the reader knows the programming language, so don't include those kinds of obvious comments.  But while writing code, you can be a poor judge of what is obvious; it's ALL obvious to you!

But there is one kind of comment that the coder *is* the best judge for: blank lines.

Writing code is a bit like writing prose.  You split your writing into paragraphs, where each paragraph represents a single coherent unit of thought.  Consider this fragment of code adapted from a heap sort algorithm:

    while (end > 0) {
        temp = a[end];
        a[end] = a[0];
        a[0] = temp;
        end--;
        sift_down(a, 0, end);
    }

I see four "units" of thought here.  The first is that we are going to loop until the size of the heap becomes zero.  The second is that we are swapping the head with last leaf of the heap.  The third is that we are removing the last leaf from the heap, and the fourth is sifting down the new top of heap to its proper location.  Four units of thought should be four paragraphs.  But this code doesn't contain any paragraph breaks.  Maybe this example isn't severe, but I've seen screen after screen of code packed together.  It's hard to read because it's hard to *find* the units.

I think this is a big improvement (I didn't include a blank line after the "while" because the indention itself provides a visual break.):

    while (end > 0) {
        temp = a[end];
        a[end] = a[0];
        a[0] = temp;

        end--;

        sift_down(a, 0, end);
    }

The author may not be a good judge of obvious v.s. non-obvious, but at least he knows which groups of lines represent individual units of thought.

So, maybe we're agreed on blank lines.  What about textual comments?  I've seen many people who usually add a comment to each paragraph.  For example:

    /* loop until the size of the heap becomes zero */
    while (end > 0) {
        /* swap head of heap with last leaf */
        temp = a[end];
        a[end] = a[0];
        a[0] = temp;

        end--;  /* Remove that last leaf (the old head) */

        /* re-balance the heap */
        sift_down(a, 0, end);
    }

Better?  Or are the comments unnecessary because the code itself is obvious?  I think "temp=a; a=b; b=temp;" is a common-enough idiom that calling it swap may be unnecessary.  But is it obvious that "a[0]" and "a[end]" represent the head and last leaf nodes of the heap?  Different people may have different opinions.  And what about the third comment?  I think everybody would agree that "end--;  /* decrement end */" would be absurd.  But saying that it removes the last leaf from the heap might be helpful to those unfamiliar with the standard array implementation of an ordered heap.  And the fourth comment?  Would the average programmer know what the "sift_down()" operation is for an ordered heap?  (You DO all know what an ordered heap is, right?)

I have mixed feelings.  As a maintainer, I usually view comments with suspicion since they are often not updated as the code is modified.  But as a coder I try (and often fail) to add comments when the "what" is obvious, but the "why" may not be.  We know what "end--;" does: it decrements "end".  But why are we decrementing it?  Because we are removing the last leaf.

But the "why" question can be asked recursively.  Why are we removing the last leaf?  When do we stop asking "why"?  Here's where judgment comes in, and as I said, the coder is often the worst person to make that judgment call.  Which is why 99% of code is poorly-commented!

I think the value of these particular textual comments is debatable.  The value of the blank lines is not.  In my opinion, the blank lines are the most important comments in the code fragment.


One more point about code paragraphs and their conceptual units of thought: what about making the source one line shorter by moving the "end--" higher-up?

    /* loop until the size of the heap becomes zero */
    while (end > 0) {
        /* swap head of heap with last leaf, and remove it */
        temp = a[end];
        a[end--] = a[0];
        a[0] = temp;

        /* Re-balance the heap */
        sift_down(a, 0, end);
    }

I feel this is bad.  Yes, the code is the same - the post-decrement modifies the right array element.  But this is mixing two conceptual units of thought into one paragraph.  Swapping head and last leaf, followed by removing the last leaf is NOT a single unit of thought; it's two.  If your command has the word "and" in it, please consider breaking the paragraph up.  But even worse is that the order of operatioons described in the comment is not embodied by the code.  The "remove it" is second step, but the code for it is right smack in the middle of the first step.  (Don't worry, modern compilers won't generate less-efficient code if you make end--; a separate line.)

If you really want to compress this code vertically, I would combine the swap onto one source line:

        temp = a[end];  a[end] = a[0];  a[0] = temp;


While I'm at it, PLEASE comment the end brace of functions with the function's name!

    int heap_sort(...)
    {
        ...
    }  /* heap_sort */


And please add TWO blank lines between the end of one function the start of the next.

No comments: