Saturday, December 13, 2014

Rapid Cycle Testing

I'm still experimenting with TDD, and the  jury  is  still  out.  As with so many religious wars, I suspect the best advice is to strive for balance.

But one thing that I'm completely sold on is Rapid Cycle Testing.  (Not to be confused with Short Time Horizon Designing, which I'm *not* sold on.)  To the degree that you *do* write automated tests, they should be executed frequently.  How frequently?  Well...
  • How about once per release cycle, right before code freeze?
  • How about once a week?
  • How about once a day, as part of the "nightly" build?
  • How about several times a day, as part of each developer build?
or
  • How about every time you save a source file from your editor?
Working with Clojure and Midje's autotest has sold me on that last one.  It is only practical for simple tests which execute very quickly, but so far that is what most of my unit tests have turned out to be.


Ruby

Being somebody who enjoys learning new things, I decided to do some Ruby learning.  When I learn a new thing, I like to experiment and take notes.  I have files named "learning_lisp.txt" and more recently "learning_clojure.txt".  These files have small examples of source code with minimal explanation; they aren't intended to be useful for somebody else, but only to trigger my memory.  In particular, they show the results of my experiments.

The other day I saw this post, in which the author describes writing unit tests as he learns Ruby.   This made complete sense to me, and I wish I had seen that post a year ago!  Now that I'm learning Ruby, my notes file is "learning_ruby.rb" ... ".rb" instead of ".txt".  My experiments are entered in there instead of just typed interactively in a REPL.  I've only just started, and I'm not sure if I'll stick with it, but I really like the idea of having my notes file be executable; if nothing else, it catches my typos.

Unfortunately, Ruby doesn't have a Midje-like autotest that I know of, so I wrote my own.  I'm sure this will need to evolve significantly over time, especially as I work on larger projects, but here's my "start small" approach.  Note that this is Unix-centric (I use it on my Mac):

    fswatch  -i "\.rb$" -e ".*" . | xargs -n1 ./test1.sh

where "test1.sh" contains little more than:

    echo "========================================="
    ruby $*
    date

Note, I don't think fswatch comes with Mac out of the box.  I used Homebrew's:

    brew install fswatch

I love this setup!  I have my "learning_ruby.rb" file in my editor and my Ruby book open.  I see a new function, and I enter a couple of experiments in my ".rb" file and save it.  The "fswatch" window immediately runs it and shows me the results.  Talk about instant gratification!

Thursday, December 11, 2014

Wanna know who *really* landed a man on the moon?


A woman named Margaret.

https://medium.com/@3fingeredfox/margaret-hamilton-lead-software-engineer-project-apollo-158754170da8

More info on Wikipedia.  Excerpt:
Hamilton's work prevented an abort of the Apollo 11 moon landing: Three minutes before the Lunar lander reached the Moon's surface, several computer alarms were triggered. The computer was overloaded with incoming data, because the rendezvous radar system (not necessary for landing) updated an involuntary counter in the computer, which stole cycles from the computer. Due to its robust architecture, the computer was able to keep running; the Apollo onboard flight software was developed using an asynchronous executive so that higher priority jobs (important for landing) could interrupt lower priority jobs. 
Smart engineer.

Sunday, December 7, 2014

Short Time Horizon Designing

In my previous blog post, I talked about different data representations for recording a game of 10-pin bowling.  I started with an obvious representation, and moved toward a more clever one.

I want to admit that it took me quite a while to write the bowling code.  Part of the reason is that I am very simply *not* a super fast coder (hence my blog title).  Never have been.
<digression selfserving="1">Fortunately, my lack of blinding coding speed has not been an impediment to my career.  In every job I've had, the actual time spent coding is small compared the the time spent doing all the other things that a programmer does.  So if I'm slower at 15% of my job, and just as fast (or faster) at the other parts, my overall productivity is on par with the coding cowboys.  If you need a code jumper, I'm probably not your guy.  If you need code that will stand the test of time, I probably am.</digression>
Anyway, when I started writing the bowling code, I tried using a TDD approach.  One common practice with TDD is to not try to design an ideal system up front, but to only code the minimum required to satisfy the current test set.  I call this the "short time horizon" approach to design.  I've heard Agilists also argue in favor of a short time horizon -- only design your current iteration's software, and if that design turns out to be inadequate for future stories, refactor as needed.

The idea is that by not over-thinking the task at hand, you can do it ... (insert ominous music here) ... faster.  Get something working now.  Worry about the future when it arrives.

So, when I started writing the bowling code, I only looked ahead to the currently-written failing test.  Being an old C programmer, I started with an array of 10 "frame" structures, each with 3 "rolls".  It wasn't until late in the cycle that I included a test case with two strikes in a row, causing me to realize the difficulty in looking forward 2 rolls in a frame-based structure.  I changed the data representation in a fundamental-enough way that the vast majority of the code had to be modified / re-written.

No problem, right?  TDD and Agile is all about refactoring, right?

The danger is that time pressures will result in less refactoring, and more hacking and patching, leading to brittle software that takes a long time to maintain and enhance.  It's always cheaper to hack in this one little feature *without* the refactoring, so the refactor rarely gets done and you end up with a mess.  Ultimately, customers suffer from buggy software releases due to unmaintainable code.

Mind you, I don't blame "management" for this!  (At least, I don't *only* blame management.)  Any time I've gone to my manager -- be it project, product, or administrative -- and said, "It's time already!  We *need* to refactor this code," they've always made time in the schedule.  Sometimes, we've had to debate the definition of "need", but when I've pushed hard enough, they've always agreed.  It's the programmers themselves who push back just as often, sometimes out of an assumption that management wouldn't agree, other times because they're just sick and tired of the code, and want to move on.  With this bowling project, I was programming for fun, so I didn't mind re-writing it.  But it still would have saved me a lot of time had I spent 15 minutes or so thinking and prototyping before deciding.

In the bad old pure-waterfall days, we tried to get the approach right during one marathon design phase, before line 1 of code was written.  It resulted in low overall productivity, and still didn't eliminate the need to refactor.  I firmly believe that the rapid cycle encouraged by Agile and TDD is a good idea, but yesterday I let the pendulum swing too far the other way.  I should have been more meditative up front.  :-)  And I guess that's my point here: balance.  Don't design to death, but also don't let TDD and Agile be the death of design.

---

Update: Who knew there was a whole movement around slow programming?  (Thanks John!)

Saturday, December 6, 2014

Bowling for Aardvarks

Why Aardvarks?  Because in the 80s I knew somebody who made up the fictitious game show as a quintessentially absurdist entertainment.  He is the same guy who made up the word "blunjo" simply because it made him laugh to say it.  I enjoyed his company, and I'll be damned if I can remember his name!  Maybe someday he will google "blunjo" or "bowling for aardvarks" and get in touch with me.

But that's not what I wanted to talk about.  I want to meditate a bit about a data representation conundrum: obvious v.s. clever.  I normally avoid cleverness in favor of obviousness.  But sometimes it's the wrong choice.

I recently wrote some scoring code for bowling (10 pin), and it got me thinking about representing data.  To refresh your memory, here's a typical score card (courtesy of google images):


I won't explain all the rules, Wikipedia does that, but Ms. Lane here rolled twice in frame 1, knocking down 9 pins and then 1 for a spare.  In the second frame, she only rolled once, knocking down all 10 for a strike.  Etc.


The Obvious Representation

If we want to represent a game's scoring in software, how might one do it?  An old C programmer like me would immediately think about an array of 10 "frame" structures:
    struct frame_s {
        int roll[2];
        int c_score;    /* cumulative */
    }
    struct game {
        char name[80];
        struct frame_s frame[10];
    }
(Full disclosure: I actually did all this in Clojure, with nested vectors rather than structures, but the issues I talk about apply equally to both languages.  I'll stick with C for this post.)

But wait!  The 10th frame can have up to three rolls.  How to represent that?  Given the structure of the paper scoring card, this seems the most obvious and direct representation:
    struct frame_s {    /* frames 1-9 */
        int roll[2];
        int c_score;    /* cumulative */
    }
    struct frame_10_s { /* frame 10 is special */
        int roll[3];
        int c_score;    /* cumulative */
    }
    struct game_s {
        char name[80];
        struct frame_s frame[9];
        struct frame_10_s frame_10;
    }
Is this the right approach?  Everybody who bowls knows what a score card looks like, and treating frame 10 as special is arguably the most obvious and least astonishing way to represent a score card ...  to a bowler.

But who is going to be looking at those data structures?  Programmers.  And I bet that even programmers who bowl would turn a bit pale at the "struct frame_10_s" approach; it would lead to complex code, with lots of "if (frame_num == 10)" statements and *almost* duplicated code.  Hard to get right, hard to maintain.

So let's instead go with 10 copies of "struct_frame_s" containing "int roll[3]" on the assumption it will simplify the code (it does).


Eliminate Cumulative Score

Does anything else stand out?  What about "int c_score", the cumulative score?  It seemed natural given the structure of a paper score sheet, but it's just duplicate data.  You can derive the cumulative score for any given frame by starting at frame 1 and working your way forward.  Sometimes, old C programmers tend shy away from that approach for efficiency reasons.  "OMG, that makes scoring a game O(n**2)!"

But duplicate data also introduces lots of opportunities for bugs.  I would rather start with the more simple and safe approach, and only optimize if subsequent profiling indicates that it is needed.

Let's eliminate "c_score".  Here's our structure:
    struct frame_s {
        int roll[3];
    }
    struct game {
        char name[80];
        struct frame_s frame[10];
    }


A Very Different Representation

So, are we done?  Let's think about the scoring algorithm.  When you get a strike, you add the next two rolls.  Look at Ms. Lane's game.  In frame 2 she got a strike, so she took the 10 for those 10 pins, added the two roles in frame 3 (7 and 2).  No problem.  But what if she had gotten a strike in frame 3?  I.e. two strikes in a row.  In that case, frame 2's score would have to access both frame 3 *and* frame 4.  For a strike, the number of forward frames to access isn't constant, and requires some messy if-then-else code.

Let's try a different representation entirely:
    struct game_s {
        char name[80];
        int roll[21];
    }
We don't divide the rolls up by frame, we just store the rolls sequentially.  The maximum number of rolls in a game is 21 (no strikes in frames 1-9, and a strike or spare in frame 10).  The minimum number of rolls is 11 (strikes in 1-9, and neither strike nor spare in frame 10).

This makes it much easier to calculate a strike's score; you just look at the next two rolls.  In fact, just to make the algorithm even easier, you could use "int roll[23]" with the extra two rolls initialized to zero.  That way, a 21-roll game that ends with a strike can use the same algorithm, which adds the next two rolls.  It doesn't need special case code for frame 10.

So, is this the right choice?  It makes it easier to do the look-ahead calculations for strikes.  But it also makes it non-trivial to find the frame boundaries when you need to, like during display.  And it moves the data representation *very* far from the application's natural visualization of the data (the paper score card).

I've concluded that this approach is better.  I've noodled around with the code, and it's pretty straight-forward to write simple methods to walk the array and identify the frames.  The overall code was easy to write, easy to test, and I think it is easy to read.  I've achieved software simplicity by deviating from the obvious data representation in favor of a more clever one.  I must admit to being a little uneasy arriving here; "clever" is almost a dirty word in my lexicon.  :-)

Friday, October 24, 2014

Event logger

A long time ago, I mentioned an event logger that I promised I would create a downloadable package for.  I obviously never did.

I've had a couple of opportunities to recommend it recently, so I decided to accept the fact I may never get it a nice neat package, and I should just describe it.

The idea is that in multi-threading and/or distributed systems, it is often not practical to debug problems by adding debug logging.  Sometimes logging changes the timing of the code too much (either masking the problem you're tracking, or even causing new problems due to slowness).  Another problem with logging debug output is that often it takes millions of loops before a problem reproduces, and nobody wants to crawl through gigabytes of log files.  (I can't tell you how many times we've filled disks up with debug logs trying to catch hard-to-reproduce problems.)

So my event logger is intended to be a VERY low-impact debugging aid.  It is a bit of a pain to use, so it is often more of a last resort, but sometimes it's the only way to track a tricky race condition.

Here's a minimal version of it:

/* globals */
#define EVLOG_SIZE 65536                      /* power of 2 */
volatile unsigned long evlog[EVLOG_SIZE];
volatile unsigned long evlog_num_logs = 0;

/* The ORing constant 0x80000000 is there to give a hint that the entry contains a line number. */
#define EVLOG0 do { evlog[ (evlog_num_logs++) & (EVLOG_SIZE-1) ] = 0x80000000 | __LINE__; } while (0)
/* EVLOG1 should only be used immediately after EVLOG0 (like on the same line) */
#define EVLOG1(val) do { evlog[ (evlog_num_logs++) & (EVLOG_SIZE-1) ] = val; } while (0)

Now inside your code, you can sprinkle calls to EVLOG0 and EVLOG1:

    EVLOG0;
    s = socket(...);
    EVLOG0;  EVLOG1(s);
...

    EVLOG0;
    close(s);
    EVLOG0;  EVLOG1(s);

So, the above will create a nice log of events in the global evlog array.  What do you do with it?  One thing is that by making it evlog and evlog_num_logs globals, you can find them with gdb if your program crashes and dumps core.  Also, if your code contains sanity tests and calls some kind of fatal error function, that function can write the contents of those globals to a debug dump file.  Finally, you can even add a user interface to trigger dumping the globals to a file.

Note that although I use this code for multithreaded debugging all the time, it is technically not thread-safe.  The evlog_num_logs++ is not atomic, for one thing, which means that if two CPUs execute it simultaneously, the variable will only be incremented once.  However, given that the primary goal of this event logger is to minimize impact, it is usually better to live with the risk of a mildly-corrupted event log.

I've used this technique in the past, and there is lots of room for improvement.  For example, expand what you save with each event: thread ID, high-resolution time stamp, source file name (to go with line number), etc.  But be careful - add too much and you deviate from the "minimal impact" goal.

One final thought - many products have a "debug mode" in which lots of debug output is spewed to a debug file.  I've already mentioned some problems with this -- changing timing can mask or introduce problems and too much output -- but there's a much bigger problem: debug mode is never there when you need it.  Customer calls you up and says, "a bad thing happened."  You say, "turn on debug mode."  Customer does, and is not able to reproduce the problem.  With this kind of low-impact event logging, you can leave it on all the time.  Have it auto-dump on crashes, and have a customer-triggerable dump also.  You may not be logging the right events, but at least you'll have *some* evidence to work with ... better than "a bad thing happened."

Wednesday, October 8, 2014

If This Then That

I just ran across the site: ifttt.com (stands for "if this then that").  It appears to be a primitive autonomous intelligent personal agent, capable of monitoring a variety of information sources and triggering various actions.  I just set up a rule to send me an email when I post to my blog.

So part of this post is just a test of IFTTT.  :-)

Although the idea interests me, I'm not sure how heavily I will use it (my email inbox is clogged enough without messages telling me that it might rain tomorrow).  But who knows?  If I find a compelling use for it, I'll post it here.  Which will generate an email to myself.  Which will cause me to remove that rule since, hey, I think I know when I post to my own blog.

Tuesday, September 23, 2014

Tail Call Recursion in Clojure


In this article, I explain tail call recursion, show an example in both Clojure and common lisp, opine that Clojure's approach to tail call recursion optimization is better than lisp's, and finally look at Clojure's "lazy-seq" which seems to mysteriously provide the benefits of tail call recursion optimization for algorithms which are *not* tail call recursive.

Note: in this article, I refer to "tail call recursion", but I am actually dealing with a subset of that domain: tail *self* call (a.k.a. "direct") recursion. The theories presented also apply to the general case, including mutual (a.k.a. "indirect") recursion. However, optimization of tail *mutual* call recursion is much more rare in compilers than optimization of tail *self* call recursion. So I'm only considering self call.


What Is Tail Call Recursion?

Tail call recursion is a special case of recursion. With recursion, a function calls itself. *Tail* call recursion is where the function calls itself, and then immediately returns, either returning nothing, or returning the value of the recursive call. I.e. it doesn't call itself, and then do something with the results recursive call's value before returning. The reason tail call recursion is an interesting special case is that it can be optimized in a way that doesn't require a stack frame for each recursive invocation (i.e. it converts it into a simple loop). This makes it faster and conserves stack space.

Take the most-obvious way to program factorial recursively in clojure:

(defn fact1 [x]
  (if (<= x 1)
    1N
    (* x (fact1 (dec x)))))

(fact1 5) ;= 120N
(fact1 9000) ;= StackOverflowError

This is not tail call recursive. If you call (fact 5), that initial invocation needs to wait for the recursive invocation to return the value of 4! so that it can multiply it by 5 and return the result. It needs to maintain the state of x across the recursive invocation, and it uses the stack for that. Clojure's stack seems awfully small to me; less than 9000 levels deep.

It's easy to modify fact1 to be tail recursive. It requires a modified version in which you pass a "result so far" as an additional input parameter:

(defn fact2-tail [x result-so-far]
  (if (<= x 1)
    result-so-far
    (fact2-tail (dec x) (* x result-so-far))))

(fact2-tail 5 1N) ;= 120N
(fact2-tail 9000 1N) ;= StackOverflowError

(If you don't like having to pass the "1N", you can simply define a wrapper function.) This is tail call recursive because all the work, including the multiplication, is done before the recursive call.  So theoretically no state needs to be stacked up as the recursion goes deeper.  But as you can see by the stack overflow, Clojure doesn't realize this.

Be aware that some lisps will do automatic tail call optimization (or at least tail self-call optimization). I tried the equiv thing with clisp and it worked fine with big values:

(compile
 (defun fact2-tail-lisp (x result-so-far)
   (if (<= x 1)
     result-so-far
     (fact2-tail-lisp (1- x) (* x result-so-far)))))

(fact2-tail-lisp 59999 1) ;= 260,630-digit number starting with 2606

So what's a Clojure programmer to do? Use the loop/recur construct. Let's convert fact1, which is NOT tail call recursive, to loop/recur:

(defn fact1-loop [x]
  (loop [i x]
    (if (<= i 1)
      1N
      (* i (recur (dec i))))))
CompilerException java.lang.UnsupportedOperationException: Can only recur from tail position

Interesting. Clojure is NOT smart enough to recognize tail call recursion in fact2-tail, but it is smart enough to recognize NON-tail call recursion in fact1_loop. So the Clojure programmer must both recode the algorithm to be tail call recursive and code it to use loop/recur:

(defn fact2-loop [x]
  (loop [i x, result-so-far 1N]
    (if (<= i 1)
      result-so-far
      (recur (dec i) (* i result-so-far)))))

(fact2-loop 5) ;= 120N
(fact2-loop 59999) ;= 260,630-digit number starting with 2606

There! One advantage to using loop/recur is you don't need a separate function which takes the extra result-so-far parameter. It's kind of built in to the loop construct.


Which Is Better?

So, which is better: Clisp, which automatically did the optimization, or Clojure, where the programmer basically has to *tell* the compiler to optimize it with loop/recur?

I have to side with Clojure, and I'll tell you why. I started my experimentations knowing that lisp did the optimization automatically. I coded it up, and it overflowed the stack. Huh? I had to do a bunch of googling to discover this page which mentions in footnote 1 that you have to compile the code to get it to do the optimization. Had I not tried it with a large number, or if I had mis-coded the algorithm to not really be tail call recursive, then I would have put in some effort to achieve an intended benefit, would not have received that benefit, and I might not have known it.

Most of the time, you need to carefully code your algorithm to be tail call recursive. I prefer Clojure's approach where you tell the compiler that you intend it to perform the optimization, and it gives you an error if it can't, as opposed to lisp which silently leaves it un-optimized if you make a mistake.  It also gives a hint to a future maintainer that your potentially awkward algorithm is coded that way for a reason (admit it: fact2-tail and fact2-loop are more awkward algorithms than fact1).


What About lazy-seq?

Now I'm experimenting with lazy-seq, which can also be used to avoid stack consumption. It's a different animal -- the factorial function only returns a single value, whereas lazy-seq is intended to assemble a list (or more-accurately, a lazy sequence).

Let's start with re-writing fact1 to return a list of all factorials, from 1 to the requested value:

(defn fact3 [x i result-so-far]
  (if (> i x)
    '()
    (cons
     (* i result-so-far)
     (fact3 x (inc i) (* i result-so-far)))))

(fact3 5 1N 1N) ;= (1N 2N 6N 120N)
(last (fact3 5 1N 1N)) ;= 120N
(last (fact3 9000 1N 1N)) ;= StackOverflowError

This function is recursive but doesn't use loop/recur. No wonder it blew up the stack. To convert this to loop/recur requires a change in the output.  The final return value has to be the entire list, so the "cons"ing needs to be done on-the-fly each time through the loop. Thus, larger factorial values are "cons"ed onto a list-so-far of smaller factorial values, resulting in a reversed list:

(defn fact3-loop [x]
  (loop [i 2N, result-so-far '(1N)]
    (if (> i x)
      result-so-far
      (recur (inc i) (cons (* i (first result-so-far)) result-so-far)))))

(fact3-loop 5) ;= (120N 24N 6N 2N 1N)
(first (fact3-loop 5) ;= 120N
(first (fact3-loop 9000)) ;= 31,682-digit number starting with 8099
(first (fact3-loop 59999)) ;= OutOfMemoryError  Java heap space

We solved the stack problem, but ran out of heap assembling the massive list (note that most of the elements of the list hold VERY large numbers, which consumes significant space for each list element).

lazy-seq to the rescue! Here's it is in action:

(defn fact3-lazy [x i result-so-far]
  (if (> i x)
    '()
    (cons
     (* i result-so-far)
     (lazy-seq (fact3-lazy x (inc i) (* i result-so-far))))))

(fact3-lazy 5 1N 1N) ;= (1N 2N 6N 24N 120N)
(last (fact3-lazy 5 1N 1N) ;= 120N
(last (fact3-lazy 9000 1N 1N) ;= 31,682-digit number starting with 8099
(last (fact3-lazy 59999 1N 1N) ;= 260,630-digit number starting with 2606

Look at the algorithm; it's almost identical to fact3. I.e. it is *not* tail call recursive! Each recursive call must return its list, which is then "cons"ed onto (* i result-so-far).

The lazy-seq function allows Clojure to conserve stack, even though the algorithm is not tail call recursive. It is also able to reclaim unneeded early list items (I'm only interested in the last one), so it conserves heap.

Did Clojure figure out a way to not have to maintain state across all 59999 invocations? Or is it just doing it with fact3-lazy in a more-space-efficient way than fact3? I'm guessing that it is maintaining state on the heap instead of the stack, but I'll be honest, I'm not exactly sure what's going on.

Monday, September 22, 2014

The Meditative Coder


I changed the name of my blog from "Sford's Blog" to "The Meditative Coder".

A bit pretentious or pompous? I hope not. I do think about code a lot.

I'm not talking about design. Lots of people could claim to be a meditative designer, striving to make their structure and algorithm are elegant, efficient, and robust. I cut my professional teeth in a culture where coding was considered a low art. Design was where the prestige lay. And architecture? That gets you a corner office and groupies. I'm not kidding! I've seen some pretty mediocre programmers put on a pedestal when they achieved architect status.

I don't want to minimize the importance of a good architectural framework and a good design. However, I think coding sometimes gets short shrift. I think that well laid-out  and organized code is beautiful. Given the choice, I would rather maintain a program whose design is so-so and whose code is excellent, than vice versa.

I will admit to spending significant time after I've gotten a program working, just to make the code more readable -- changing variable and function names to be more-precisely descriptive, adding or subtracting whitespace to make the code clauses more logical, even re-factoring the code a bit to have cleaner, more-intuative abstractions, and even adding comments (judiciously).

So yeah, I think I'm a pretty meditative coder, perhaps more-so than average.

Finally, while "Sford's Blog" appeals to my minimalist aesthetic, it was both correct and useless. Besides, "meditative coder" got absolutely NO hits on Google; how can I resist? :-)

Friday, September 12, 2014

Clisp Performance

In my previous post, I did some speed measurements of Clojure.  John commented with a test in common lisp, which performed MUCH better than Clojure.

So I did some more experimentation.  Based on John's "fact2" results, his laptop appears to be about 3x faster than mine; not surprising since I have a 2nd gen Mac Air, and he probably has a real computer :-).  But when I ran John's fact4 with clisp, I got:

(time (fact4 1000))
Real time: 0.003765 sec.
Run time: 0.003765 sec.

Whoa!  That's over 200 times slower!  I tried it again.  And again.  And again.  Same results.

Then I tried (fact4 10000) and got "APPLY: too many arguments given to +".  Oh yeah, now I remember that about clisp.  :-)

Then I tried John's trick of timing a loop of them:

(time (dotimes (i 1000 (fact4 1000))))
Real time: 0.005146 sec.
Run time: 0.005117 sec.

Each run of (fact4 1000) now took 5 microseconds. I could imagine some overhead which needs to be amortized across multiple loopings, but that seems extreme.

Look at this:

(progn
  (time (dotimes (i 1 (fact4 1000))))
  (time (dotimes (i 10 (fact4 1000))))
  (time (dotimes (i 100 (fact4 1000))))
  (time (dotimes (i 1000 (fact4 1000)))))
Real time: 0.003783 sec.
Run time: 0.003783 sec.
Space: 8760 Bytes
Real time: 0.002432 sec.
Run time: 0.002394 sec.
Space: 8760 Bytes
Real time: 0.002528 sec.
Run time: 0.002528 sec.
Space: 8760 Bytes
Real time: 0.003901 sec.
Run time: 0.003832 sec.
Space: 8760 Bytes

They ALL took approximately the same time!  Is clisp somehow not doing the loop?  Maybe it sees that there are no side-effects, and just optimizes out all but one run?  Maybe it implicitly does Clojure's "memoize" function of caching results?  I kept going:

(progn
  (time (dotimes (i 1000 (fact4 1000))))
  (time (dotimes (i 10000 (fact4 1000))))
  (time (dotimes (i 100000 (fact4 1000))))
  (time (dotimes (i 1000000 (fact4 1000)))))
Real time: 0.004611 sec.
Run time: 0.00461 sec.
Space: 8760 Bytes
Real time: 0.017613 sec.
Run time: 0.01703 sec.
Space: 8760 Bytes
Real time: 0.126134 sec.
Run time: 0.125795 sec.
Space: 8760 Bytes
Real time: 1.231531 sec.
Run time: 1.230281 sec.
Space: 8760 Bytes

FINALLY!  The times start increasing by factors of 10 after about 10,000 runs, settling down to about 1.2 microseconds per "(fact4 1000)".  John - I suspect that you would see even better performance if you did more loops.

Anyway, I went back to the Clojure programs and did the same thing with timing multiple loopings.

(do
  (time (dotimes [i 1] (fact1 1000)))
  (time (dotimes [i 10] (fact1 1000)))
  (time (dotimes [i 100] (fact1 1000)))
  (time (dotimes [i 1000] (fact1 1000))))
"Elapsed time: 0.396 msecs"
"Elapsed time: 2.413 msecs"
"Elapsed time: 39.325 msecs"
"Elapsed time: 250.707 msecs"

The timings are basically consistent with a 10-fold increase in time with each 10-fold increase in loops.  Yes, there is variability (presumably due to GC), but multiple runs have tended to prove the rule.

I get the same trends with fact2 and fact3.  So my originally-posted timings were about right.

Conclusion?  I've spent WAY too much time on this issue this morning and I'm going to be late to work.  I think I'll just skip my morning exercise and only be a little late.  :-)

(Update: my original C-language timings were wrong.  I've corrected this post based on correct C timings.  I no longer mistrust the "clisp" results, and have removed claims to that effect.)

Update 2: now that I trust the timing, I did a bit more number crunching. I'll assume that clisp can run a single call to (fact4 1000) in 1.2 microseconds. There is obviously some kind of overhead associated with the first call. (Maybe clisp is compiling "fact4" just in time?  I.e. after "time" took the start timestamp?) Let's say that it takes 3500 microseconds for that overhead. So, with varying number of loops, I would expect to see the following:

3500 + 1.2 * 1 = 3501.2
3500 + 1.2 * 10 = 3512
3500 + 1.2 * 100 = 3620
3500 + 1.2 * 1000 = 4700
3500 + 1.2 * 10000 = 15500
3500 + 1.2 * 100000 = 123500
3500 + 1.2 * 1000000 = 1203500

Given the wide variability of individual timings, this matches reasonably well the results above. I don't think clisp is doing anything too fishy.

Thursday, September 11, 2014

Clojure performance

There is a cute site www.4clojure.com which poses coding challenges to Clojure programmers.  I don't have  lot of time each day, so I'm only up to #42, but I'm having fun.  The actual problems have been trivial so far, targeted to giving a beginner practice with Clojure features.  One nice thing about it is that you can "follow" somebody else to see what their solutions are (but you can only see them *after* you've solved it yourself).  I've found a pretty experienced Clojure programmer to follow, so it's been very educational to compare my sometimes-brute force solutions with his more elegant solutions.

For example, problem #42 is to write a factorial function.  For the purposes of this post, I'm changing the factorial algorithm to do addition rather than multiplication.  This is because I wanted to test it with large numbers without arithmetic overflow to get timing information.

As a long-time C programmer, I would just write:

long long fact (long long n)
{
  long long r = 1;
  long long i;
  for (i = 2; i <= n; i++)
    r += i;    /* not really factorial */
  return r;
}

I timed that (on my 2nd gen Mac Air) for n=1,000 and it took 6 nanoseconds.

A beginning Clojure functional programmer might do it recursively like this:

(defn fact1 [n]
  (if (= n 1)
      1
      (+ n (fact1 (dec n)))))

For n=1,000, this takes about 330 microseconds - about 55,000 times longer than C.  :-(

The more-experienced guy solved it this way:

(defn fact2 [n]
  (apply + (range 1 (inc n))))

This is very elegant and uses some of Clojure's interesting features.  As soon as I saw it, I thought, "wow, that probably took him 10 seconds to write."  It executes in almost exactly the same time as the recursive version - about 330 micros.

Here is how I wrote it:

(defn fact3 [n]
  (loop [sum-so-far 1, i n]
    (if (<= i 1)
      sum-so-far
      (recur (+ sum-so-far i) (dec i)))))

Note that this is arguably the most-complex of all the solutions.  Part of the reason I wanted to write it this way was that I wanted to practice a method called "tail recursion".  In this method, the algorithm is technically recursive (hence the "recur" function), but the compiler can trivially convert it into a simple loop.  I.e. this method does not actually create 1000 stack frames.  The reason for doing this is to make it more efficient.

My version takes about 200 microseconds, only 33,000 times longer than C.  :-)

What happens if we go with n=10,000?  Surprise!  The purely-recursive version gets a stack overflow.  (That seems like an awfully small stack to me...)  The other versions each take about 10 times longer, as you would expected.

The moral of the story?  Use Clojure when you want short development cycles, not high performance.  And by that measure, the *real* winner is that more-experienced Clojure programmer who wrote "(apply + (range 1 (inc n)))".  No "if" statements to get wrong, no partial-sum to keep track of, no stacks to overflow.  By a HUGE margin, the fastest to get to a working program.

(Update: my first posting used mistakenly used C timing without optimizing.  I've corrected this article with timings using "-O3" optimization.  The optimizer gave a 600x speed improvement, from 4 microseconds to 6 nanoseconds.  Wow!)

Monday, September 1, 2014

Clojure tools: bumps on the road

I've been enjoying getting my tools sharpened and ready for use.  John Jacobsen describes a nice set of workflow habits and tools that he uses in his blog.  It includes Emacs, Leiningen, and Cider.

27-Sep-2014: I've made in-place updates to this post.


lein print and println: no longer broken

Using the Clojure REPL in Emacs via Cider, it mostly worked, but the "print" and "println" functions were broken at one point:

    user> (print 1)
    1NoSuchMethodError clojure.tools.nrepl.StdOutBuffer.length()I ...

Good old Google.  A search for the above reveals a Leiningen issue which indicates the "stable" version 2.4.3 has a problem.  When this post was first written, the fix was not yet available. Fortunately, there was a workaround: downgrade to version 2.4.2:

    lein upgrade 2.4.2

This is no longer necessary: lein 2.5 is fixed:

    user> (print 1)
    1
    nil


Emacs packages: chicken-and-egg problem?

To get a productive Emacs-based Clojure environment, you need Emacs packages.  I had some initial trouble with it not finding the Marmalade packages I asked for.  I read some advice telling me to refresh the packages, but it wasn't exactly clear how to do it all in one go.  To complicate matters, I found myself trying out the approaches of several bloggers, leading me to need to clear my config and start over more than once.

(I tried "prelude" Emacs package, but it was rather wonky and VERY slow to start up, so gave up and went with a bunch of stuff in John's config.)

So I created "refresh_pkgs.el" containing the following:

    (require 'package)
    (add-to-list 'package-archives
             '("marmalade" . "http://marmalade-repo.org/packages/"))
    (package-initialize)
    (package-refresh-contents)

When I want to clear everything and start over, I do this:

    mv .emacs.d .emacs.d.sav
    mv .emacs .emacs.sav
    emacs -nw -q -l emacs_init/refresh_pkgs.el

When the welcome screen is displayed, exit Emacs with C-x C-c.  A new ".emacs.d" directory now exists with a refreshed Marmalade.  Now create the desired "init.el" (in the ".emacs.d" dir) and enter Emacs again.  This invocation will take a lot longer because it downloads and builds the desired packages.  Once again, when the welcome screen is finally displayed, exit Emacs.  You should be good to go.  (Note: I suspect that the "emacs -nw -q -l emacs_init/refresh_pkgs.el" command will be useful without starting over if there are new packages I want to try out.)

Here's the package loading section of my "init.el":

    (require 'package)
    (add-to-list 'package-archives
                 '("marmalade" . "http://marmalade-repo.org/packages/"))
    (package-initialize)
    (defvar my-packages '(better-defaults
                          auto-complete
                          undo-tree
                          paredit
                          auto-indent-mode
                          projectile
                          clojure-mode
                          cider))
    (dolist (p my-packages)
      (when (not (package-installed-p p))
        (package-install p)))

I won't show the rest because I'm sure I'll be making a lot of changes over the weeks to come.


emacs: melpa v.s. marmalade

Ugh.  For various reasons, I combined marmalade with melpa.  The net result was a nightmare of things not working quite right.  Apparently when both are enabled, the above package list will fetch incompatible versions of different packages.  I tried going with melpa-only, and several packages were not available.  I could have messed with it for longer, but I just wanted things to work again.  So I started from scratch again with the above settings.  With the latest lein, I'm back to everything working.

Saturday, August 23, 2014

Now starting on Clojure

I think I've gotten enough Lisp under my belt for now.  Not that I'm proficient at Lisp programming, but I think I understand enough to take my next step: Clojure.  I'm not very far along, but I have to say that I think I like it.

Clojure (pronounced "closure") is a young language (2007) with the following awesome properties:
  1. It is a functional programming language, though not as purist as Haskell.
  2. It is a variant of Lisp.
  3. It is build on top of the Java Virtual Machine, meaning that a Clojure program can access the entire Java standard library.
  4. It has built-in features to support concurrency and parallelism.
I've finished the first chapter of Emerick, Carper, and Grand's "Clojure Programming" book, and I've got to say that the information density so far is very high.  If it continues like this, it will be slow going.  I guess I like the book so far, but it's nowhere near as fun as "Land of Lisp".  I could gripe a bit about the index being very incomplete, but I got both the paper and the PDF versions of the book, so I can simply search the PDF.

Anyway, as with Lisp, I'm nowhere near at the level where I can talk intelligently about Clojure, so I won't try.

Thursday, August 14, 2014

Learning Lisp

For reasons I don't completely understand, I sometimes forgot an important fact about myself: I love learning.  Oh sure, we all learn little things all the time, satisfying our fleeting curiosities with wikipedia.  But I also like to sit down to a full course meal of learning.

So I recently decided to learn lisp.

Most professional programmers will ask me: why lisp?  It is barely used outside of academia, and isn't even used within academia that much.  It won't get me a job.  It's not very practical.

Fortunately, at my age, I don't have to care about the practicality of my interests.  :-)

Part of my desire to learn lisp comes from guilt associated with this blog.  Lisp is one of the pioneering computer languages.  Writing about programming without being familiar with lisp feels a bit like writing about relativity without being familiar with Newtonian physics.  Or writing about race relations without being familiar with slavery.  How can I claim to be a well-rounded developer with only knowledge of one of the pillars of the field?

In an earlier post, I mentioned finally being able to "get" lisp programming after spending 30 minutes with an on-line tutorial.  But I wanted more.  So I bought Land of Lisp / Learn to Program in Lisp, One Game at a Time!, by Conrad Barski (cheaper on Amazon, but I wanted the DRM-free PDF).  I'm about half-way through it, and am having a ball.  Partly for the obvious reason that I really love learning things.  But also because this is a good book.

Not perfect -- I suspect that neither the editors nor the early reviewers were people actually learning lisp.  Since I am actively trying to become proficient in lisp, I make sure I fully understand every line of code presented.  There are some programs where something is presented, but not explained.  For example, a type of "vector" is used in one of the programs, but is not mentioned in the text.  It required external research to discover that a lisp vector is just a simple (one-dimensional) array.

I'm afraid I also have to complain a bit about the coding style.  Meaningful variable names are often not used.  Or a term is used to mean subtly different things at different times (like "edges").  The code could benefit from more-careful naming.

However, in spite of that, I absolutely LOVE the premise - teach lisp with real-life programs.  I hate examples which just demonstrate syntax and semantics (I'm not sure I can stomach another "towers of hanoi" example).  Land of Lisp successfully helps to show *why* you would want to use various constructs, using programs that actually accomplish something meaningful.  And I really enjoy the style of Dr. Barski's prose.  His sense of humor, and his decision to use games as the basis for his programs, makes the book a joy to read.

I'm not proficient enough to say anything very intelligent about lisp as a language, so I won't just yet. However, I am learning some new ways to look at algorithms.  Having multiple points of view will always help you with your designs.  It probably won't be worth the effort in any practical or economic sense, but at least it's better than watching NCIS marathons on cable!

Tuesday, July 22, 2014

Adventure! ... in BASIC???

Over the weekend, I was feeling sorry for myself for not having as much time to program as I would like, so I sneaked a few hours (about 6) to write an adventure game.  I thought a bit about what language to use.  Part of me wanted to do it in one of the functional languages, but I knew I wouldn't have enough time to both learn the language sufficiently AND write enough to be recognizably adventurous.

And then a wave of nostalgia swept over me.

BASIC.

And none of this new-fangled BASIC with block-structured coding constructs.  Basic BASIC.  With 1 or 2 character variable names, and lots of GOTOs.

I gotta say, I haven't had this much fun programming in years.  I actually chuckled at times, remembering the old days of:

3165 if w <= 9 then 3170
3167   print "Too many words!  Talk more simply."
3168   v$ = "" :  n$ = "" : return
3170 rem

OK, the pedantic among you will say that putting multiple statements on a line is not basic BASIC.  Oh well, I make the rules, I can break them.  But I did restrict myself to variable names of one letter, or one letter and one digit, with all variables gloriously global.  Man, kids today don't know how good they have it.

>load "a.txt"
>run
You are in a brightly-lit room.  Birds are singing, puppies and kitties are playing in the sunlight streaming in through cheerily open windows.  There is a hole in the north wall that looks just big enough to crawl through.
You see:
  a grey bag of scratchy cloth
? north
You are in a dimly-lit room.  Shadowey figures seem to slither towards you.  A low moan can faintly be heard.  What little light penetrates the ominous darkness comes from the south.
You see:
  a fabulous 100-carat diamond!
? take diamond
got it.
You are in dim room
? n
Can't go that way
You are in dim room
? s
You are in a brightly-lit room.  Birds are singing, puppies and kitties are playing in the sunlight streaming in through cheerily open windows.  There is a hole in the north wall that looks just big enough to crawl through.
You see:
  a grey bag of scratchy cloth
? look at the grey bag
a grey bag of scratchy cloth
which contains:
  an iron key, slightly rusted
You are in bright room
? take bag
 NEXT without FOR  in line 9075
>

Whoopsie!  Looks like I need a bit more debugging.  :-)

Or not.  The exercise accomplished its purpose, which honestly was more to feel young again than hone my coding skills.  I.e. I don't think I'll be putting this on my resume any time soon.  ;-)

Update: I thought I might put it up on github, not because it is a useful or notable piece of work, but because I wanted to be the first person to put BASIC on github.  No such luck.  There is a ton of BASIC already on github.  Oh well...

Tango.net / Tango.me SMS Text Spam

A colleague of mine downloaded the Tango app, which went into her contact list and spammed text messages.  The text consisted of, "Check out my photo on Tango" followed by a URL which was obviously not specific to a particular photo, or even a particular user.  I went ahead and followed the link and it was just an advertisement for the Tango app.

My colleague was mortified by the fact that she unknowingly spammed her contact list, especially since not all phone users are on unlimited texting plans, and assured me that she gave no authorization.  I suspect that she actually did authorize it, but under sufficiently-ambiguous wording that it wasn't obvious -- probably something like, "Do you wish to share your photos with your friends?"

Looking at Tango's terms of use, I see:

11. Third Party Fees.For particular Devices, Tango may ask your permission to use your native SMS application to deliver messages or invitations to people who are not registered users of the Services and with whom you choose to communicate. Some of these services may charge additional fees.

Yeah, I bet.

I am not happy.  I've never liked spam, and text message spam is especially bad since it is more disruptive and costly than email spam.

Please don't use Tango.net or Tango.me until they stop this deplorable practice.

Tuesday, June 24, 2014

My work email inbox is empty

Wow.  I think the last time my work in-box was empty was when I joined 29West, back in 2005.  It stayed empty for probably an hour.

I dunno ... it's actually a little scary looking.

Monday, June 16, 2014

Maglus Stylus

I am very happy with the Maglus stylus for my iPad.  I wrote an Amazon review of why I like it so much (there are specific reasons).

The reason for this post is that you can get a 10% discount if you use this link.  It's part of an "affiliate" program that also kicks me a bit of money, which I have re-directed to the American Red Cross.

That said, I did notice that on Amazon, I could find it for even less than the 10% off list, so if you're looking to save money, shop around.  I would rather see you save as much money as you can, and independently donate a decent amount to the Red Cross because you want to support their good work.

Anyway, I never feel 100% comfortable promoting a commercial product lest I come across as a corporate shill.  Then I saw a post on their corporate blog on James Joyce and Ulysses.  Somehow that makes me feel less guilty being a booster.  Which is really silly since I've never read Ulysses!

Hmm ... what tags should I set on this post?  I don't see one for "shameless commerce".  I guess "tips" since I really do like the stylus a lot.

Thursday, June 12, 2014

Order in Structure Assignment Statements

I had what I thought was a pretty simple piece of code, transferring a structure from one thread to another.  Without describing it blow-by-blow, I'll present it thus: a writer thread is measuring temperature and pressure of a steam turbine.  A reader periodically samples the measurement.  There is no need for the reader to get every measurement, it only needs the most-recent at the point that it samples.


DESIGN!

A global data structure:

struct measurement_s {
    volatile uint64_t post_seq;  /* sequence number of measurement */
    volatile  int64_t temperature;
    volatile  int64_t pressure;
    volatile uint64_t pre_seq;   /* sequence number of measurement */
} live;

The algorithm is for the writer to first increment pre_seq, then update the temperature and pressure, then increment the post_seq.  The reader reads the post_seq first, then temperature and pressure, then pre_seq.  If the reader sees pre_seq != post_seq, it means that a collision has happened, and the reader should re-try.  Here's the reader code:

        struct measurement_s sample;
        do {
            sample = live;
        } while (sample.pre_seq != sample.post_seq);
        /* now have valid sample */


BROKEN!

Pretty simple.  Easy to understand.  And it didn't work right.  The test code was written to make it obvious when inconsistent values are read, and the code above generated inconsistent reads.  A LOT of inconsistent reads.  How can this be???  I used volatile to prevent the optimizer from messing with the order of reads and writes, and it's an x86-64, which has a very predictable memory model, removing the need for most hardware memory barriers.  The algorithm is correct!  Why is it failing?


DIAGNOSE!

So I did what I've been doing a surprising amount recently: I generated assem language (this time withOUT optimization turned on).  Here's the structure assignment:

    leaq    -64(%rbp), %rax
    movq    _live+24(%rip), %rcx
    movq    %rcx, 24(%rax)
    movq    _live+16(%rip), %rcx
    movq    %rcx, 16(%rax)
    movq    _live+8(%rip), %rcx
    movq    %rcx, 8(%rax)
    movq    _live(%rip), %rcx
    movq    %rcx, (%rax)

HEY!  It copied the structure in the REVERSE ORDER than I expected!  So it copies the pre_seq first, then the temperature and pressure, and post_seq last!  No wonder it detected collisions, that order is NOT thread-safe!


HAPPILY EVER AFTER!

So I replaced the structure assign with explicit copies of the fields:

        struct measurement_s sample;
        do {
            /* Don't do "sample = live;", order of field copies is not defined. */
            sample.post_seq = live.post_seq;
            sample.temperature = live.temperature;
            sample.pressure = live.pressure;
            sample.pre_seq = live.pre_seq;
        } while (sample.pre_seq != sample.post_seq);

YAY!  It works.  (Whew!)

It would also be possible to define a sub-structure containing temperature, pressure, and any additional interesting data.  The fields in the sub-structure could be copied as struct assign, allowing the compiler to do it in any order.  But the pre_seq and post_seq must be individually copied in the right order.


DIGRESSIONS!

I do love those exclamation points!

So, the above code works.  How *good* is the design?  For physical process measurement, probably just fine.  It would be rare to measure physical characteristics more often than a few times per second, so the chances of the reader having to re-try the sample are very low.  (On the other hand, why not use mutexes if you are going that slow?)

But suppose this algorithm is applied to market data, which might have many thousands of ticks per second?  The algorithm shown suffers from a fundamental flaw: starvation.  If a writer and reader get pathologically synchronized, there is no bound to the number of times the reader will cycle, trying to get a clean sample.

In practice, it might be fine.  But whenever possible, I prefer a solution that works by design, not by circumstance.

I think a better approach might be to use a non-blocking queue, like this one.  The writer presumably requires non-zero work to produce a new record.  The consumer should read the queue dry in a tight loop before processing the last record read.  Now the queue need only be long enough to hold the maximum number of produced records between two reads.  A slow reader will simply clean out the queue each time it reads.


Some might question the wisdom of doing ANY kind of programming where the order of field copies in a struct assign is important.  In the old days, this program would have been done with a critical section, protected by a mutex or semaphore.  That works fine, and doesn't depend on order.  But high-performance applications of today need much lower latencies.  So lock-free algorithms are important.  With lock-free, you need to concern yourself with order; the compiler can change the order of the assem language operations from the source code, and the hardware can change the order of physical memory operations from the assem code.  Compile optimizer barriers address the former; hardware memory barriers address the latter.


Finally, some will question my choice of making all the fields of the structure volatile.  I admit it is a heavy-handed approach that I used for expediency -- this was a prototype after all.  For production code I would use a more-precisely targeted VOL macro (mentioned in an earlier post), often in cooperation with a general compile optimizer barrier, like "asm volatile ("" ::: "memory");".  Note that these techniques are NOT used to squeeze every nanosecond out of the code.  They are used to insure code correctness.

Tuesday, June 10, 2014

Error handling: the enemy of readability?

I've been a programmer since the mid-1970s, call it 40 years.  During that time I've seen the rise into popularity: structured programming and object-oriented programming.  (More-recently, I've seen increased interest in functional programming, and I've started learning a bit about it myself, but I'm not conversant-enough in it to write intelligently about it.  So I'll omit it from this discussion.)

After all these years, I find it very interesting that I haven't seen a common approach to internal error handling become widely-accepted, the way structure and object-orientation have.

I'm speaking here of internal errors, not user errors.  If the user enters bad input, you owe it to the user to detect the problem and report it in a user-friendly way, allowing the user to correct his mistake.  User error detection and handling code is fully part of the program.

Internal errors are different.  These are found by inserting sanity checks to see if something that "should never happen" might have actually happened.  Very often, failed sanity checks represent a software bug that needs to be tracked down and fixed.


ROBUST CODE vs. READABLE CODE?

Most approaches to internal error handling I've seen are poor.  Maybe it is partly due to a lack of programmer discipline - we all get lazy sometimes and forget to check a return status - but I don't think that is the primary reason.  I believe the main culprit is the need for readable code.

The greatest expense in any significant programming effort is not the original writing of code, it's the subsequent maintenance.  The original programmers eventually leave the project, and less-experienced programmers often take over the less-glamorous job of fixing bugs and adding enhancements.  It is critically important that the code be readable.  External documentation is typically out of date moments after it is written, and even internal documentation (comments) are suspect.  The code itself is the only reality that matters, and the harder it is to understand that code, the harder (and more expensive) it will be to maintain and improve it.

And unfortunately, doing a thorough job of checking and handling internal errors often makes the code harder to read.

C

Which is easier to read and maintain?  This:

example 1:
  foo_t *foo_create()
  {
      foo_t *new_foo = malloc(sizeof(foo_t));

      new_foo->state_filep = fopen("foo.state", "rw");
      new_foo->data_filep = fopen("foo.data", "rw");

      new_foo->reader = reader_create();
      new_foo->writer = writer_create();

      return new_foo;
  }  /* foo_create */

or this:

example 2:
  foo_t *foo_create()
  {
      foo_t *new_foo = malloc(sizeof(foo_t));
      if (new_foo != NULL) {  /* malloc successful? */

          new_foo->state_filep = fopen("foo.state", "rw");
          if (new_foo->state_filep != NULL) {  /* fopen successful? */
              new_foo->data_filep = fopen("foo.data", "rw");
              if (new_foo->data_filep != NULL) {  /* fopen successful? */

                  New_foo->reader = reader_create();
                  if (new_foo->reader != NULL) {  /* reader successful? */
                      new_foo->writer = writer_create();
                      if (new_foo->writer != NULL) {  /* writer successful? */

                          return new_foo;
                      }
                      else {
                          log_error("Could not create writer");
                          reader_delete(new_foo->reader);
                          fclose(new_foo->data_filep);
                          fclose(new_foo->state_filep);
                          free(new_foo);
                          return NULL;
                      }
                  }
                  else {
                      log_error("Could not create reader");
                      fclose(new_foo->data_filep);
                      fclose(new_foo->state_filep);
                      free(new_foo);
                      return NULL;
                  }
              }
              else {
                  log_error("Could not open data file");
                  fclose(new_foo->state_filep);
                  free(new_foo);
                  return NULL;
              }
          }
          else {
              log_error("Could not open state file");
              free(new_foo);
              return NULL;
          }
      }
      else {
          log_error("Could not malloc foo");
          return NULL;
      }
  }  /* foo_create */

Most projects I've worked on use a hybrid of the two approaches, but most lean heavily towards one or the other.  The first is easy to read, but brittle; if anything goes wrong, the problem often doesn't generate any visible symptoms till much later, making it a nightmare to diagnose.  The second detects errors early and does proper reporting and cleanup, but is a nightmare to read and maintain.  (All that duplicated code in the else clauses is sure to get out of sync at some point, resulting in resource leaks or uninitialized accesses, neither of which will be detected unless some error actually happens and the error handling code finally gets exercised.)

Wouldn't it be nice to have both good error detection and readability?

example 3:
  foo_t *foo_create()
  {
      foo_t *new_foo = NULL;
      new_foo = malloc(sizeof(foo_t));  ASSRT(new_foo!=NULL);
      memset(new_foo, 0, sizeof(new_foo));

      new_foo->state_filep = fopen("foo.state", "rw");  ASSRT(new_foo->state_filep!=NULL);
      new_foo->data_filep = fopen("foo.data", "rw");  ASSRT(new_foo->data_filep!=NULL);

      new_foo->reader = reader_create();  ASSRT(new_foo->reader!=NULL);
      new_foo->writer = writer_create();  ASSRT(new_foo->writer!=NULL);

      return new_foo;

  ASSRT_FAIL:
      if (new_foo != NULL) {
          if (new_foo->writer != NULL) reader_delete(new_foo0->writer);
          if (new_foo->reader != NULL) reader_delete(new_foo0->reader);
          if (new_foo->data_filep != NULL) fclose(new_foo0->data_filep);
          if (new_foo->state_filep != NULL) fclose(new_foo0->state_filep);
          memset(new_foo, 0, sizeof(new_foo));  /* poison the freed structure */
          asm volatile ("" ::: "memory");       /* See note [1] */
          free(new_foo);
      }

      return NULL;
  }  /* foo_create */

Note [1]: See http://blog.geeky-boy.com/2014/06/clangllvm-optimize-o3-understands-free.html

My philosophy is to catch errors as early as possible by including sanity checks, but doing it unobtrusively.  Make it easy to ignore if you're looking for the code's basic algorithm, but easy to understand if you are analyzing the error handling.  I think example 3 to be approximately as readable as the first example, and as robust as the second.  It is important to keep the error handling code visually small so that it can be appended to the ends of lines.  This:

      foo->state_filep = fopen("foo.state", "rw");  ASSRT(foo->state_filep!=NULL);

is much cleaner than this:

      foo->state_filep = fopen("foo.state", "rw");
      if (foo->state_filep) {
          blah blah blah
      }

As you've probably guessed, the ASSRT macro does nothing if the expression is true, or does a "goto ASSRT_FAIL;" if the expression is false.  A variation is to use "abort()" instead of goto.  In many ways this is FAR superior for a troubleshooter; a core file can provide a full stack trace and access to program state.  Calling "abort()" also simplifies the code further, removing explicit clean up.  But it also means that the code has given up on any hope of recovery.  Differentiating between an un-recoverable error and a recoverable error is often not possible at the point where the error is detected; it is usually considered good form to report the error and return a bad status.

(FYI - the "memset()" before the "free()" is commented as "poison the freed structure".  The purpose of this is to make sure that any dangling references to the foo object will seg fault the instant they try to de-reference one of the internal pointers.  Again, my philosophy is to find bugs as soon as possible.)


ASSRT()

What is this magical "ASSRT()" macro?

  #define ASSRT(cond_expr) do { \
      if (!(cond_expr)) { \
          char errmsg[256]; \
          snprintf(errmsg, sizeof(errmsg), "%s:%d, failed assert: (%s)", \
              __FILE__, __LINE__, #cond_expr); \
          log_error(errmsg); \
          goto ASSRT_FAIL; \
      }  \
  } while (0)

For those unfamiliar with C macro magic, there are two bits here that might benefit from explanation.  First the "do ... while (0)".  This is explained pretty well here.  It's the standard way to avoid tricky bugs when macros are used as part of if statements.  The second is the "#cond_expr" at the end of the sprintf.  The C preprocessor converts "#macro input parameter" into a C string.

So, the macro expansion of "new_foo = malloc(sizeof(foo_t));  ASSRT(new_foo!=NULL);" is:

      new_foo = malloc(sizeof(foo_t));  do {
          if (!(new_foo!=NULL)) {
              char errmsg[256];
              snprintf(errmsg, sizeof(errmsg), "%s:%d, condition false: '%s'",
                  "foo.c", 50, "new_foo!=NULL");
              log_error(errmsg);
              goto ASSRT_FAIL;
          }
      } while (0);

An important reason for making this a macro instead of a function is so that the __FILE__ and __LINE__ compiler built-ins apply to the source line which uses ASSRT.  I've also sometimes enhanced the ASSRT macro, making it specific to a program or even a function by including prints of interesting state information.  I've also sometimes included errno.  These improvements help the troubleshooter diagnose the internal problem.

One big drawback to this approach?  The error messages logged can only be loved by a programmer. What would an end user do with this?

  Error logger: foo.c:50, failed assert: (new_foo!=NULL)

This is wonderful for a programmer; I don't know how many times I've seen an error log containing, "could not open file", only to find that the string appears many times in the source code.  This pin-points the source file and line.  However, this message is virtually useless to an end user.

On the other hand, is it really any worse than this?

  Error logger: could not malloc foo

This looks less like geek-speak, but doesn't actually help an end user any more.  In both cases, the user has little choice but to complain to the developer.

My point is that including an error-specific, highly-descriptive message makes the error handling code longer and more-intrusive.  One goal of mine is to keep the ASSRT on the same line as the line it's checking; hard to do if including a descriptive message.  But as soon as it overflows onto subsequent lines, it interferes with the logical flow of the code, making it harder to read.  You figure, since internal errors are usually the result of a software bug, it's not so bad to log errors that only a programmer could love - only programmers need to deal with them, users only need to accurately report them.  (Which reminds me of a war story.)

I have an evolved version of this approach, which I plan to clean up and present in the not-too-distant future.


JAVA

Java adds exception handling, which some would say contradicts my claim that error handling isn't standardized.  However, Java exceptions merely standardize the low-level language tool.  It doesn't really address the approach that a program takes.  When used poorly (e.g. try-catch at every call), it reduces to the unreadability of example 2.  When done well, it approximates the good readability of example 3.  The key is the sacrifice some user-friendliness of error message in favor of keeping the program structure clear.  That sacrifice is worth it since internal errors should rarely happen.


C++

I am a bit embarrassed to admit that I'm not very familiar with C++'s exception handling facility.  I've never worked on a project that used it.  I've heard many programmers state their strongly-held opinion that C++ exceptions are to be avoided at all costs, but I don't have enough personal experience to have my own opinion.  One concrete data point: a computer engineering senior at U of I (a respected institution) tells me that their C++ course does not teach exceptions.

But I figure that even if C++ exceptions are not up to the task, my C method should work fine.  I'm not married to any specific language tool, so long as error handling can be added without obscuring the code.


P.S.

I risk being haunted by the ghost of Dijkstra by using a dreaded goto.  Maybe Knuth will protect me (alternate).  :-)


P.P.S.

I have a slightly different version in my wiki.  I change the goto to abort(), and I also print the human-readable form of errno.  While fine for most Unix tools, those changes will not be appropriate for all kinds of programs, so the form above is more general.