Sunday, January 27, 2013

Cracks in the Edifice of my Religion

I have two religions.

On the one hand, I am proudly Jewish, albeit a secular Jew.  This is in spite of the fact that, as most of my friends and many of my acquaintances know, I am an atheist.  Many non-Jews consider this to be a contradiction, but it's not - according to one survey, over a quarter of American Jews don't believe in God.  (My own Judaism/atheism is complicated, beyond the scope of this post.)

My other religion is Science.

Why do I believe in Science?  Simple.  Because it works.  Both the results and the process of Science have progressed over the millennia, and while it still isn't perfect, it is the best thing that humans have going for them.

So I guess you could say that my discovery of Jonah Lehrer's 2010 article The Truth Wears Off has given me pause.  (Margaret actually saw it when it came out and has used it in her class, but happened to leave a copy of it laying around this week.  I hadn't seen it before.)  I've always known that Science's grasp on truth is always uncertain (that's part of the point of the scientific method), but I guess I didn't know that it is as tenuous as Lehrer makes out.

Am I having a full-fledged crisis of faith?  Well ... no.  But I do wonder if my lack of deep existential dread might be part of the problem - I don't want the scientific method to be fundamentally flawed, so I choose not to believe so.  It just needs some tweaking and a bit more time to converge on highly-probable truths.

And yet, if I only believe this because I want to, how is that different from believing in God?

P.S. - one time I said to an acquaintance that Science is my religion, and he responded, "Scientology?"  NO!  Scientology is to Science as Astrology is to Astronomy.  Please don't confuse them.

Friday, January 25, 2013

Debugging Tips

A young pup of an engineer recently wrote to me:
    I was wondering (since I'm taking a coding class and I feel that one
    of my weakest points is debugging), do you have any wisdom that you
    could pass along? I would appreciate it immensely since it will most
    likely save me a considerable amount of time on the MP's we have to
    do. If my first coding class taught me anything, the longest part
    about those assignments is the debugging process (and I was awful at
    debugging my own code).
Ah, debugging - my favorite part of software development.  Hey, why are you laughing?  I'm serious!  I honestly do like debugging best.  Here are a few tips that you might find useful.  Being primarily a C programmer, my examples are in C.  But they have analogs in any language.

Bottoms Up!

Maybe I like debgging for the same reasion I like detective shows.  It took me a long time to get tired of CSI, and I still like to watch the odd episode.  I approach debugging like the CSI guys - I follow the evidence via a technique I call bottom-up debugging.

There are other approaches.  Some people take a top-down approach to debugging.  Look at the code and what it is supposed to do, and try to figure out what is broken.  It's all about, "Hmm ... that looks right ... so does that ... that should work ... gee, the code looks right to me.  Guess I need to look harder."  Yes, I'm making a bit of fun of it, but I have seen it successfully used.  I know a guy who can often just look at a screen of code for a minute and point to the bug.

Top-down debugging almost never works for me.  Bottom-up debugging asks the question: what is the code *actually* doing?  It's all about about collecting evidence and saying, "Whoa, I wonder why it did THAT!"  Followed by collecting more evidence.  Unfortunately, most of that evidence is hidden deep inside the chips.  So the job of the debugging engineer is to expose the evidence of what is going on.

Most people know how to use a source-level debugger to single-step code and examine variables, and this is fine for some bugs.  At every step of the way, you can see exactly why it is doing what it is doing.  But single-stepping runs out of steam in a hurry, especially when you need to see what's happening inside nested loops that might have thousands of iterations, or when the program is part of a system of multiple cooperating processes - you often can't use break points and single-stepping without breaking the larger system.

Kick and Scream

So what do you do if it's not feasible to single-step?  Next in most people's bag of tricks is print statements.  Sprinkle liberally throughout your program, printing the values of interesting variables and structures.  The hope is that you'll catch the program as it starts to misbehave.  But especially with looping programs where it can take a long time for the bug to manifest, the amount of output can be overwhelming.  Plus, with cooperating processes, the slowdown introduced by prints can itself disrupt the behavior of the program, long before your elusive bug is caught.  Or worse yet, the prints might change the timing of the program enough for the bug to disappear entirely.  "Works just fine when I turn on debug output, maybe we should just ship it that way."  :-)  This is known as the observer effect where the act of measuring something changes the thing under measurement.  (It is sometimes called the uncertainty principle, but this is not quite right.  The uncertainty principle leads to the observer effect, but it is not the same thing.)

Instead of printing huge volumes of debug output, I like to define a function named kick_and_scream.  The idea is that you add debug code to your program to test for unexpected conditions, and call kick_and_scream if something is wrong.  You might pass in some interesting variables and structures which kick_and_scream will dutifully print the contents of, giving you some insight into the program's state.  What it does next depends.  Here are some things I've had kick_and_scream do after printing state variables:
  • Prompt and read a line from the user.  This has the effect of essentially halting execution of the program, giving you a chance to fire up the debugger.
  • Dump core - for Unix, you should be able to call "abort()".  I assume Windows has something similar (Dr. Watson?).  The idea is to get a memory dump of the program which can be examined with a debugger.  In C, you can always force an illegal memory access:
        *(char *0) = 0;
    that often does the trick.
  • Infinite loop - I've done this for embedded systems whose operating systems are very basic and don't have any kind of core dumping capability.  Maybe you have an in-circuit emulator or an external debugger that you can use to interrupt the program and examine state.  Having the program stuck in an infinite loop will prevent it from destroying the evidence.
For example, let's say that your program maintains a linked list of nodes, and there's a global counter that tells how many nodes there are in the list.  Let's say that you have a search function which is not finding a node that you *know* is in there:
    cur_node = head_node;
    for (i = 0; i < num_nodes; i++) {
        if (cur_node.id == query_id) return cur_node;
        cur_node = cur_node.next_node;
    }
    return NULL; /* not found */
The problem is that it returns NULL when it shouldn't.  So, why is the function not finding the node?  Maybe the linked list is corrupted.  Maybe num_nodes is wrong.   Maybe a hundred other things.  Top-down debugging would consist of staring at the code to see if there's a mistake which would cause any of those.  Bottom-up consists of figuring out exactly which problem it really is.  I would add a call to kick_and_scream right before the return NULL; and pass it head_node, cur_node, i, num_nodes, and query_node.  Inside kick_and_scream I would print all those, including all the fields of head_node and cur_node (assuming the pointers are non-null).

Working Back

So, let's say you do the above and you determine that when it loops num_nodes times, the last node it checks has a non-null next_node field.  This shouldn't happen - the last node should have a null next_node.  You've made some excellent progress - now you know the probable proximate cause of not finding the node - num_nodes is out of sync with the list itself.  But the bug isn't actually here.  You want to find out where the list got out of sync with num_nodes in the first place.

I would write a new function: list_check.  Pass it num_nodes and head_node and it does this:
    cur_node = head_node;
    for (i = 0; i < num_nodes; i++) {
        if (cur_node.next_node == NULL) kick_and_scream(...);
        cur_node = cur_node.next_node;
    }
    if (cur_node.next_node != NULL) kick_and_scream(...);
The idea is that you can now sprinkle calls to check_list all over your program and basically "binary search" your way to discovering where things get out of sync.

Event Logger

The philosophy of kick_and_scream is to not print out reams of debug output, but instead to only print interesting information once an obvious problem is discovered.  Sometimes, however, you really do need to see sequences.  You might have some kind of state machine which processes input events, and you suspect that the events are arriving in an invalid order.  (Ever try to close a file before you open it?) There may not be enough information available just printing current state - you need to see that state evolve as events are being processes.

So back to adding normal print statements and crawling through thousands of lines of output.

But what if, as I hinted above, the print statements are too disruptive?  Like if they change the timing of the system too much?

I've had good luck with an event logger.  Call it a poor-man's ultra-low-impact print statement.  This is basically a global array treated like a circular queue.  Instead of printing a like of debug output, you add debug values to the array.  If the array fills, it cycles back to the beginning and overwrites the earliest values.  So low impact that it won't have any effect on timing.

The use of the event logger still depends on detecting the problem inside the code and calling kick_and_scream.  The kick_and_scream function would then print the contents of the array.  It basically shows you the last N events which led up to the failure (where N is the size of the global array).

The disadvantage of this approach is that the output is MUCH harder to interpret than nice descriptive print statements.  So I only use this when print statements disrupt the program too much to be useful.  Which happens to be fairly frequently in the kinds of software I write.

I've implemented this several times, and I want to create a downloadable package for it.  I'll announce it when it's ready.


UPDATE: http://blog.geeky-boy.com/2014/10/event-logger.html

Maria Bamford

My daughter turned Margaret and me onto Maria Bamford.  She is a comedian who has suffered from some mental illness issues, and her sense of humor is definitely not for everybody.  Weird, downer, deadpan, awkward, and very very funny for people with a sick sense of humor (like me, and apparently Margaret and my daughter).

If you're up for an experiment, start with her twenty episodes of The Maria Bamford Show.  If, after the first one, you don't absolutely hate it, try at least two more.

Then there's Maria Bamford's One Hour Homemade Christmas Special.

I hear she made some commercials for Target, and I should look those up too.  Obviously they won't be as off-the-wall as her independent creations, but I bet I'll like them anyway.

And I guess I should spend some time with the fan channel.

Friday, January 18, 2013

Blog tags

Tags are used to group posts.  If you want to see all of my postings related to, say, "death", just click on the "death" tag and you'll see the list.

I like tags.  It is better than a hierarchical system of organization since a particular item may be associated with multiple points in the hierarchy.  E.g. I might have a post tagged as "coding" and "rants", if I have a rant about some aspect of coding.

[Aside - email clients traditionally use a hierarchical folders.  But gmail (at least the web interface) gives you true tagging.  I like that, but ironically don't use it.]

One complication of tags as an organizational model is that the tags themselves must be organized, especially if the number of tags gets large.  THAT strikes me as something which could be organize into a hierarchy.  I might have a rants tag and a technical tag.  The technical tag might be sub-divided into science and software.  Software might be further subdivided into coding and debugging.  My rant about coding would show up in both the "rants" tag and the "coding" tag.

Alas, this blogging system does not allow for organizing tags hierarchically ... at least not that I know of.  It gives you two views:
  • Alphabetical
  • Tag cloud

The "tag cloud" is interesting.  It uses different font sizes for the tags, depending on the relative number of postings given that tag.  So rants might be big because I do a lot of ranting, while science might be small because I don't do much blogging about general science.  The tag cloud gives an interesting view into my head even without clicking on any of them since they suggest which topics I feel passionate about (or at least chatty about).

That said, I'm thinking it is more of a novelty than a useful organizational model.  (It's a novelty that I like, which is why I enabled it on this blog.  But it will suffer from the same unwieldiness if I create a lot of tags.)  For large numbers of tags, I'm still leaning towards hierarchical.

Since blogger doesn't support hierarchical, I guess I'll just muddle along for now.  If my number of tags gets unwieldy, I can look at maybe leveraging the alphabetizing model to represent the hierarchy.  For example:

rants
technical
technical-science
technical-software
technical-software-coding

I don't like that much, partly because the important part of the tag name becomes the last part, whereas the eye is drawn to the first part, obscuring the intent.  Also, what if I want rants to be last?  I guess I could do this instead:

01-technical
02---software
03-----coding
04---science
05-rants

That lets me order them any way I want to.  But wow, what a pain if I want to insert a new tag, like say, software design.  I'll have to renumber all the tags below it.  I guess I could do the old BASIC trick of numbering them by 10s...  (Who me?  Program in BASIC?  How old do you think I am, anyway?)

Another thought: assuming that each tag is simply a fixed URL, I could create a wiki page of the tag links and organize them any way I want to.  I could then simply have a pointer to that page on the blog.

Ah well, enough thinking about this.  Like I said, I'll just leave it a hodge-podge for now.

Content Publishing

I created a page where I am basically talking to myself about the various ways that content can be published on the Internet, and which way(s) I might want to concentrate on.  It is of minimal interest to anybody else.

http://www.geeky-boy.com/w/Sford_Content_Outlets.html

At present, I'm a minimal publisher, splitting content between my wiki and my blog.  New Wiki content is sometimes announced in the blog - like this very post.  (I also have a twitter and facebook accounts which I don't use due to low signal-to-noise ratios.)