Saturday, December 13, 2014

Rapid Cycle Testing

I'm still experimenting with TDD, and the  jury  is  still  out.  As with so many religious wars, I suspect the best advice is to strive for balance.

But one thing that I'm completely sold on is Rapid Cycle Testing.  (Not to be confused with Short Time Horizon Designing, which I'm *not* sold on.)  To the degree that you *do* write automated tests, they should be executed frequently.  How frequently?  Well...
  • How about once per release cycle, right before code freeze?
  • How about once a week?
  • How about once a day, as part of the "nightly" build?
  • How about several times a day, as part of each developer build?
  • How about every time you save a source file from your editor?
Working with Clojure and Midje's autotest has sold me on that last one.  It is only practical for simple tests which execute very quickly, but so far that is what most of my unit tests have turned out to be.


Being somebody who enjoys learning new things, I decided to do some Ruby learning.  When I learn a new thing, I like to experiment and take notes.  I have files named "learning_lisp.txt" and more recently "learning_clojure.txt".  These files have small examples of source code with minimal explanation; they aren't intended to be useful for somebody else, but only to trigger my memory.  In particular, they show the results of my experiments.

The other day I saw this post, in which the author describes writing unit tests as he learns Ruby.   This made complete sense to me, and I wish I had seen that post a year ago!  Now that I'm learning Ruby, my notes file is "learning_ruby.rb" ... ".rb" instead of ".txt".  My experiments are entered in there instead of just typed interactively in a REPL.  I've only just started, and I'm not sure if I'll stick with it, but I really like the idea of having my notes file be executable; if nothing else, it catches my typos.

Unfortunately, Ruby doesn't have a Midje-like autotest that I know of, so I wrote my own.  I'm sure this will need to evolve significantly over time, especially as I work on larger projects, but here's my "start small" approach.  Note that this is Unix-centric (I use it on my Mac):

    fswatch  -i "\.rb$" -e ".*" . | xargs -n1 ./

where "" contains little more than:

    echo "========================================="
    ruby $*

Note, I don't think fswatch comes with Mac out of the box.  I used Homebrew's:

    brew install fswatch

I love this setup!  I have my "learning_ruby.rb" file in my editor and my Ruby book open.  I see a new function, and I enter a couple of experiments in my ".rb" file and save it.  The "fswatch" window immediately runs it and shows me the results.  Talk about instant gratification!

Thursday, December 11, 2014

Wanna know who *really* landed a man on the moon?

A woman named Margaret.

More info on Wikipedia.  Excerpt:
Hamilton's work prevented an abort of the Apollo 11 moon landing: Three minutes before the Lunar lander reached the Moon's surface, several computer alarms were triggered. The computer was overloaded with incoming data, because the rendezvous radar system (not necessary for landing) updated an involuntary counter in the computer, which stole cycles from the computer. Due to its robust architecture, the computer was able to keep running; the Apollo onboard flight software was developed using an asynchronous executive so that higher priority jobs (important for landing) could interrupt lower priority jobs. 
Smart engineer.

Sunday, December 7, 2014

Short Time Horizon Designing

In my previous blog post, I talked about different data representations for recording a game of 10-pin bowling.  I started with an obvious representation, and moved toward a more clever one.

I want to admit that it took me quite a while to write the bowling code.  Part of the reason is that I am very simply *not* a super fast coder (hence my blog title).  Never have been.
<digression selfserving="1">Fortunately, my lack of blinding coding speed has not been an impediment to my career.  In every job I've had, the actual time spent coding is small compared the the time spent doing all the other things that a programmer does.  So if I'm slower at 15% of my job, and just as fast (or faster) at the other parts, my overall productivity is on par with the coding cowboys.  If you need a code jumper, I'm probably not your guy.  If you need code that will stand the test of time, I probably am.</digression>
Anyway, when I started writing the bowling code, I tried using a TDD approach.  One common practice with TDD is to not try to design an ideal system up front, but to only code the minimum required to satisfy the current test set.  I call this the "short time horizon" approach to design.  I've heard Agilists also argue in favor of a short time horizon -- only design your current iteration's software, and if that design turns out to be inadequate for future stories, refactor as needed.

The idea is that by not over-thinking the task at hand, you can do it ... (insert ominous music here) ... faster.  Get something working now.  Worry about the future when it arrives.

So, when I started writing the bowling code, I only looked ahead to the currently-written failing test.  Being an old C programmer, I started with an array of 10 "frame" structures, each with 3 "rolls".  It wasn't until late in the cycle that I included a test case with two strikes in a row, causing me to realize the difficulty in looking forward 2 rolls in a frame-based structure.  I changed the data representation in a fundamental-enough way that the vast majority of the code had to be modified / re-written.

No problem, right?  TDD and Agile is all about refactoring, right?

The danger is that time pressures will result in less refactoring, and more hacking and patching, leading to brittle software that takes a long time to maintain and enhance.  It's always cheaper to hack in this one little feature *without* the refactoring, so the refactor rarely gets done and you end up with a mess.  Ultimately, customers suffer from buggy software releases due to unmaintainable code.

Mind you, I don't blame "management" for this!  (At least, I don't *only* blame management.)  Any time I've gone to my manager -- be it project, product, or administrative -- and said, "It's time already!  We *need* to refactor this code," they've always made time in the schedule.  Sometimes, we've had to debate the definition of "need", but when I've pushed hard enough, they've always agreed.  It's the programmers themselves who push back just as often, sometimes out of an assumption that management wouldn't agree, other times because they're just sick and tired of the code, and want to move on.  With this bowling project, I was programming for fun, so I didn't mind re-writing it.  But it still would have saved me a lot of time had I spent 15 minutes or so thinking and prototyping before deciding.

In the bad old pure-waterfall days, we tried to get the approach right during one marathon design phase, before line 1 of code was written.  It resulted in low overall productivity, and still didn't eliminate the need to refactor.  I firmly believe that the rapid cycle encouraged by Agile and TDD is a good idea, but yesterday I let the pendulum swing too far the other way.  I should have been more meditative up front.  :-)  And I guess that's my point here: balance.  Don't design to death, but also don't let TDD and Agile be the death of design.


Update: Who knew there was a whole movement around slow programming?  (Thanks John!)

Saturday, December 6, 2014

Bowling for Aardvarks

Why Aardvarks?  Because in the 80s I knew somebody who made up the fictitious game show as a quintessentially absurdist entertainment.  He is the same guy who made up the word "blunjo" simply because it made him laugh to say it.  I enjoyed his company, and I'll be damned if I can remember his name!  Maybe someday he will google "blunjo" or "bowling for aardvarks" and get in touch with me.

But that's not what I wanted to talk about.  I want to meditate a bit about a data representation conundrum: obvious v.s. clever.  I normally avoid cleverness in favor of obviousness.  But sometimes it's the wrong choice.

I recently wrote some scoring code for bowling (10 pin), and it got me thinking about representing data.  To refresh your memory, here's a typical score card (courtesy of google images):

I won't explain all the rules, Wikipedia does that, but Ms. Lane here rolled twice in frame 1, knocking down 9 pins and then 1 for a spare.  In the second frame, she only rolled once, knocking down all 10 for a strike.  Etc.

The Obvious Representation

If we want to represent a game's scoring in software, how might one do it?  An old C programmer like me would immediately think about an array of 10 "frame" structures:
    struct frame_s {
        int roll[2];
        int c_score;    /* cumulative */
    struct game {
        char name[80];
        struct frame_s frame[10];
(Full disclosure: I actually did all this in Clojure, with nested vectors rather than structures, but the issues I talk about apply equally to both languages.  I'll stick with C for this post.)

But wait!  The 10th frame can have up to three rolls.  How to represent that?  Given the structure of the paper scoring card, this seems the most obvious and direct representation:
    struct frame_s {    /* frames 1-9 */
        int roll[2];
        int c_score;    /* cumulative */
    struct frame_10_s { /* frame 10 is special */
        int roll[3];
        int c_score;    /* cumulative */
    struct game_s {
        char name[80];
        struct frame_s frame[9];
        struct frame_10_s frame_10;
Is this the right approach?  Everybody who bowls knows what a score card looks like, and treating frame 10 as special is arguably the most obvious and least astonishing way to represent a score card ...  to a bowler.

But who is going to be looking at those data structures?  Programmers.  And I bet that even programmers who bowl would turn a bit pale at the "struct frame_10_s" approach; it would lead to complex code, with lots of "if (frame_num == 10)" statements and *almost* duplicated code.  Hard to get right, hard to maintain.

So let's instead go with 10 copies of "struct_frame_s" containing "int roll[3]" on the assumption it will simplify the code (it does).

Eliminate Cumulative Score

Does anything else stand out?  What about "int c_score", the cumulative score?  It seemed natural given the structure of a paper score sheet, but it's just duplicate data.  You can derive the cumulative score for any given frame by starting at frame 1 and working your way forward.  Sometimes, old C programmers tend shy away from that approach for efficiency reasons.  "OMG, that makes scoring a game O(n**2)!"

But duplicate data also introduces lots of opportunities for bugs.  I would rather start with the more simple and safe approach, and only optimize if subsequent profiling indicates that it is needed.

Let's eliminate "c_score".  Here's our structure:
    struct frame_s {
        int roll[3];
    struct game {
        char name[80];
        struct frame_s frame[10];

A Very Different Representation

So, are we done?  Let's think about the scoring algorithm.  When you get a strike, you add the next two rolls.  Look at Ms. Lane's game.  In frame 2 she got a strike, so she took the 10 for those 10 pins, added the two roles in frame 3 (7 and 2).  No problem.  But what if she had gotten a strike in frame 3?  I.e. two strikes in a row.  In that case, frame 2's score would have to access both frame 3 *and* frame 4.  For a strike, the number of forward frames to access isn't constant, and requires some messy if-then-else code.

Let's try a different representation entirely:
    struct game_s {
        char name[80];
        int roll[21];
We don't divide the rolls up by frame, we just store the rolls sequentially.  The maximum number of rolls in a game is 21 (no strikes in frames 1-9, and a strike or spare in frame 10).  The minimum number of rolls is 11 (strikes in 1-9, and neither strike nor spare in frame 10).

This makes it much easier to calculate a strike's score; you just look at the next two rolls.  In fact, just to make the algorithm even easier, you could use "int roll[23]" with the extra two rolls initialized to zero.  That way, a 21-roll game that ends with a strike can use the same algorithm, which adds the next two rolls.  It doesn't need special case code for frame 10.

So, is this the right choice?  It makes it easier to do the look-ahead calculations for strikes.  But it also makes it non-trivial to find the frame boundaries when you need to, like during display.  And it moves the data representation *very* far from the application's natural visualization of the data (the paper score card).

I've concluded that this approach is better.  I've noodled around with the code, and it's pretty straight-forward to write simple methods to walk the array and identify the frames.  The overall code was easy to write, easy to test, and I think it is easy to read.  I've achieved software simplicity by deviating from the obvious data representation in favor of a more clever one.  I must admit to being a little uneasy arriving here; "clever" is almost a dirty word in my lexicon.  :-)