Sunday, November 24, 2019

I will miss you, Gahan Wilson

Gahan Wilson, creator of hilariously macabre cartoons, died last Thursday. His dark creations helped to shape my own sense of humor, even though as a child I was not able to enjoy them very often.

Thank you Gahan for teaching me that it is OK to not be main-stream.

Sunday, June 16, 2019

Should everybody learn to code?

I saw something on SlashDot that raised the question: should all school children learn to code?

Yes.

For the same reason all school children should spend some time learning to sing, learning to do long division, learning to paint, learning some physics, learning some literature, learning to use a wrench and screwdriver, learning some history, learning some chemistry, etc, etc, etc. I believe that children who get a reasonably well-rounded, reasonable quality education grow up to be happier than those who don't, all else being equal. I don't have thousands of pages of peer-reviewed scientific research to support my belief, but I still believe it.

This does not mean that we should try to teach all children to be *good* programmers, any more than we should try to teach all children to be professional-level singers. We just need to introduce a wide-range of subjects so that they have a basic understanding of what the subject is about. After that, let their natural talents and interest drive where they go in-depth.

I never knew that software was my calling until I had my first opportunity to try coding in ... was it 1975? This introduction was *not* taught in school. It was a group called "explorer scouts", which may or may not have been associated with Boy Scouts of America, I don't remember. All I can tell you is that there were no uniforms, no camping, none of the trappings of Boy Scouts. The only activity I can remember was the programming. It was Fortran on a time-sharing system with a teletype. The fire it ignited in my brain overwhelms any other memories of the Explorer Scouts.  THIS is what I wanted to do; my path in life was obvious.


HOW TO TEACH IT?

So, how should programming be taught?  I don't know. I know there has been a lot of research and development in the area of programming systems for young people. Drag-and-drop icons representing programming constructs. I've glanced at them, and even used one briefly (it was a google doodle). It's OK, I guess.  I think an important goal is to give a child early success and a feeling of accomplishment. I don't think those systems actually teach *programming*, but I think they probably do teach some fundamental concepts that are needed for programming.

My concern is that those systems don't really give a flavor of what programing is like. If you sing in music class, you get a sense of what singing is. If you play touch football, you get a sense of what that is. I actually consider myself fortunate that I wasn't introduced to programming that way. I suspect I would have thought that it was kind of neat, but I'm not sure it would have lit the fire. Part of what inspired me with my exposure was just the limitless possibilities. I'm not sure you get that with drag-and-drop programming.

The problem with most languages is the sheer amount of infrastructure you need to master before you can do things.  To write a simple Java program, you need to define a namespace and a class.  To print something, you need to enter System.out.println("something");. To read a line from the user will require several lines of junk that would require hours of explanation if you want the person to understand what the lines mean. Granted, you could just boilerplate it and tell the student to ignore all that stuff till later, but I think that reduces engagement. As does the edit/compile/run cycle. I don't think Java is a good first language.

I have taught a few people to program. Want to know what worked for me?

"NO!  DON'T SAY IT!  PLEASE FOR THE LOVE OF ALL THAT IS DECENT, NO!"

Basic.

"GAAAAAAAAAAAAAAH %@!&*$!!!"


BASIC???

Yep. And I mean BASIC Basic. Line numbers. One or two character variable names. No "else".

10 print "hello"
20 goto 10
run

I find that humor is a good teaching tool.  If your first program is an infinite print loop, it's kind of funny. Not very funny, but a little. Also, you definitely need a REPL (Read-Eval-Print Loop), which Basic is. No edit/compile/run cycle please.

It takes me an afternoon to teach somebody to program. We don't get sophisticated, but by the end of the afternoon we do end up writing a simple "text adventure" style game.

You are facing two doors.  Do you want the right one or the left one?
? right
A lion has just eaten you!!!
Try again?
? yes
You are facing two doors.  Do you want the right one or the left one?
? left
You found a gold coin!
Do you want to keep going?  Or turn around?  (answer "going" or "turn")
? turn
You are facing two doors.  Do you want the right one or the left one?

etc. See? More humor. How many ways can you kill the player?  :-)

I feel that the Basic language didn't get in the way of seeing the simple algorithm. Any other language would have obscured the simple beauty of what we were doing.

[Update] Now mind you, this is a one-day exercise! For somebody with more interest, I would have a second lesson with subroutines and gross Basic-style input parameters (globals, actually), and for the third lesson, we would switch to a modern language. Maybe Python? Maybe Java? Maybe Lisp? It would depend on who it is and what they might do with it.

Anyway, I did this 1-day lesson with my daughter (she was ... I don't remember ... maybe 10?) and she did well.  Afterwards, she did ask me some questions and got a little help.  Later that year she gave me a birthday present. It was a CD ROM with her updated adventure program. It has maybe a dozen "rooms" and lots of jokes and little surprises in it.  She was VERY proud of herself. (And me of her; I still have the disk.)

But she pretty much lost interest. I am confident it was *not* because of the primitive language. It was just because it didn't light a fire in her. Which is fine.

Anyway, I really think an afternoon of Basic is a good way to introduce programming to a newbie of any age.  I *know* it works.

Wednesday, April 17, 2019

Black Hole Revealed!

I am in awe of the results of the Event Horizon Telescope team in their image of M87*, the supermassive black hole at the center of a distant galaxy!  I can't help but be especially excited at the role that computer scientists played in the analysis and reconstruction of the data collected by the radio telescopes to produce the image.

Radio astronomers have been doing long-baseline interferometry for a while now to produce images.  But the challenges of the Event Horizon Telescope were beyond what the earlier processing algorithms could make sense of.  The software team led by Katie Bouman developed the CHIRP algorithm that kind of blows my mind.  It warms my heart that women in science are finally getting some of the recognition they deserve.  (It also depresses me greatly that misogynist trolls are getting some press; geeze, can't we just enjoy the accomplishment?)  Anyway, Katie did a Ted Talk a few years ago that gives some excellent explanation about the algorithm.

If you want some understanding of why the image looks the way it does, I think that Derek Muller's Veritasium video does the best job that I've seen.  He also has a good follow-up video.

I also really appreciated astrophysicist Becky Smethurst's video that explains why the results are important (it's more than just further supporting Einstein's theory of gravitation).

Thursday, January 24, 2019

Volatile considered harmful

I happened on this today.  The article is narrowly-focused on Linux kernel work, but in my mind it helps to clarify a lot of "volatile" debate I've seen over the years.

https://www.kernel.org/doc/html/latest/process/volatile-considered-harmful.html


I will note that when Corbet (the author) says, "the 'volatile' type class should not be used", what he really means is that you should not declare variables with volatile (or rather, almost never).  Corbet says, "the kernel primitives which make concurrent access to data safe ... If they are used properly, there will be no need to use volatile as well."  Some of those kernel primitives use volatile, but not in variable declarations.  Instead they use volatile in carefully-selected casts.

For example, as described in another Corbet article, he talks about another kernel primitive, "ACCESS_ONCE()".  It is defined as:

    #define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))

The variable being accessed is temporarily cast to volatile to allow code to be written that violates threading assumptions made by the compiler's optimizer.  Like here, for example.  :-)

I only bring this up to point out that Corbet is not arguing that volatile should never be used at all.  Rather he is arguing that programmers (specifically kernel programmers) should not declare variables to be volatile.  Generally, programmers should use threading primitives to ensure correct code, and if the code's requirements prevent the use of the usual threading primitives, then lower-level primitives (like "ACCESS_ONCE()") should be used to precisely target volatile's use.

Wednesday, September 12, 2018

Goodby my Wiki

This post is a little late in coming as I made the change earlier this year.

I used to use a hosting service, suso.com,  for my main website and email.  It ran a message board I built using Perl, and it also ran a older version of wikimedia for a personal wiki.  But the service cost a fair amount, and neither the message board nor the wiki was being used much.  So to save money, I cancelled it and moved the content to github.

The content is still available at geeky-boy.com, but I dumped the wiki pages to a flat directory.  If I want to edit them, I need to edit the raw HTML.

I somewhat mourn the loss of my wiki.  I like wikis for certain kinds of content.  It is especially powerful for collaborative efforts, particularly for geographically-separated teams.  But even for single-user personal-use, it presents such a low barrier to use.  If I want to update a page, it's just a few button clicks away.  And the update process is easy (assuming minimal use of fancy wiki markup).  And it's easy to see change history, roll back changes, etc.

Contrast this with web pages on github where you have to edit them locally in HTML, check in the changes, and sync with "gh-pages" branch to make them live.  It takes longer, and requires specialized software.  E.g. I can't easily do it from a phone or tablet.

There are "free" wikis out there, but I don't like the ads, and most of them use non-wikimedia software; the few I've tried I haven't liked.

Maybe someday I will try some kind of third-party content management system.  Or maybe I'll write my own wiki software as a fun personal project (maybe do the rendering of the markup in the browser in Javascript).  Or maybe I just don't really need a wiki.  Long ago, I imagined that the blog and the wiki would compliment each other, with content in each referring to content in the other.  Blog for "news", wiki for "content".  But it hasn't worked out that way.

So rest-in-peace wiki.geeky-boy.com.  You were fun while you lasted.

Tuesday, September 11, 2018

Safe sscanf() Usage

The scanf() family of functions (man page) are used to parse ascii input into fields and convert those fields into desired data types.  I have very rarely used them, due to the nature of the kind of software I write (system-level, not application), and I have always viewed the function with suspicion.

Actually, scanf() is not safe.  Neither is fscanf().  But this is only because when an input conversion fails, it is not well-defined where the file pointer is left.  Fortunately, this is pretty easy to deal with - just use fgets() to read a line and then use sscanf().

You can use sscanf() safely, but you need to follow some rules.  For one thing, you should include field widths for pretty much all of the fields, not just strings.  More on the rules later.


Input Validity Checking with sscanf()

I am a great believer in doing a good job of checking validity of inputs (see strtol preferred over atoi).  It's not enough to process your input "safely" (i.e. preventing buffer overflows, etc.), you should also detect data errors and report them.  For example, if I want to input the value "21" and I accidentally type "2`", I want the program to complain and re-prompt, not simply take the value "2" and ignore the "`".

One problem with doing a good job of input validation is that it often leads to lots of checking code.  In strtol preferred over atoi, my preferred code consumes 4 lines, and even that is cheating (combining two statements on one line, putting close brace at end of line).  And that's for a single integer!  What if you have a line with multiple pieces of information on it?

Let's say I have input lines with 3 pipe-separated fields:

  id_num|name|age

where "id_num" and "age" are 32-bit unsigned integers and "name" is alphabetic with a few non-alphas.  We'll further assume that a name must not contain a pipe character.

Bottom line: here is my preferred code:

#define NAME_MAX_LEN 60
. . .
unsigned int id;
char name[NAME_MAX_LEN];
unsigned int age;
int null_ofs = 0;

(void)sscanf(iline,
    "%9u|"  /* id */
    "%" STR(NAME_MAX_LEN) "[-A-Za-z,. ]|"  /* name */
    "%3u\n"  /* age */
    "%n",  /* null_ofs */
    &id, name, &age, &null_ofs);

if (null_ofs == 0 || iline[null_ofs] != '\0') {
    printf("PARSE ERROR!\n");
}


Error Checking

My long-time readers (all one of you!) will take issue with me throwing away the sscanf() return value.  However, it turns out that sscanf()'s return value really isn't as useful as it should be.  It tells you how many successful conversions were done, but doesn't tell you if there was extra garbage at the end of the line.  A better way to detect success is to add the "%n" at the end to get the string offset where conversion stopped, and make sure it is at the end of the string.

If any of the conversions fail, the "%n" will not be assigned, leaving it at 0, which is interpreted as an error.  Or, if all conversions succeed, but there is garbage at the end, "iline[null_ofs]" will not be pointing at the string's null, which is also interpreted as an error.


STR Macro

More interesting is the "STR(NAME_MAX_LEN)" construct.  This is a specialized macro I learned about on StackOverflow:

#define STR2(_s) #_s
#define STR(_s) STR2(_s)

Yes, both macros are needed.  So the construct:
  "%" STR(NAME_MAX_LEN) "[-A-Za-z,. ]"
gets compiled to the string:
  "%60[-A-Za-z,. ]"

So why use that macro?  If I had hard-coded the format string as "%60[-A-Za-z,. ]", it would risk getting out of sync with the actual size of "name" if we find somebody with a 65 character name.  (I would like it even more if I could use "sizeof(name)", but macro expansion and "sizeof()" are handled by different compiler phases.)


Newline Specifier

This one a little subtle.  The third conversion specifier is "%3u\n".  This tells sscanf() to match newline at the end of the line.  But guess what!  If I don't include a newline on the string, the sscanf() call still succeeds, including setting null_ofs to the input string's null.  In my opinion, this is a slight imperfection in sscanf(): I said I needed a newline, but sscanf() accepts my input without the newline.  I suspect sscanf() is counting newline as whitespace, which it knows to skip over.

If I *really* want to require the newline, I can do:

if (iline[null_ofs - 1] != '\n') then ERROR...


Integer Field Widths

Finally, I add field widths to the unsigned integer conversions.  Why?  If you leave it out, then it will do a fine job of converting numbers from 0..4294967295.  But try the next number, 4294967296, and what will you get?  Arithmetic overflow, with a result of 0 (although the C standards do not guarantee that value).  And sscanf() will not return any error.  So I restrict the number of digits to prevent overflow.  The disadvantage is that with "%9u" you cannot input the full range of an unsigned integer.  An alternative is to use "%10Lu" and supply a long long variable.  Then range check it in your code.  Or you could just use "%8x" and have the user input hexidecimal.  Heck, you could even input it as a 10-character string and use strtol()!

I guess "%9u" was good enough for my needs.  I just wish you could just tell sscanf() to stop converting on arithmetic overflow!


Not Perfect

So, any other problems with the above code?  Well, yeah, I guess.  Even though I specify "id" and "age" to be unsigned integers, I notice that sscanf() will allow negative numbers to be inputted.  Sometimes we engineers consider this a feature, not a bug; we like to use 0xFFFFFFFF as a flag value, and it's easier to type "-1".  But I admit it still bothers me.  If I input an age of -1, we end up with the 4-billiion-year-old man (apologies to Mel Brooks).

Tuesday, September 4, 2018

Safe C?

A Slashdot post led me to some good pages related to C safety.
(Update: actually I did go ahead and order the tee shirt.)


Monday, August 27, 2018

Safer Malloc for C

Here's a fragment of code that I recently wrote.  See anything wrong with it?

#define VOL(_var, _type) (*(volatile _type *)&(_var))
. . .
lbmbt_rcv_binding_t *binding = NULL;
. . .
binding = (lbmbt_rcv_binding_t *)malloc(sizeof(lbmbt_rcv_t));
memset(binding, 0xa5, sizeof(lbmbt_rcv_t));
binding->signature = SIGNATURE_OK;
. . .
VOL(binding->signature), int) = SIGNATURE_FREE;
free(binding);

No, the problem isn't in that strange VOL() macro; that is explained here.

The problem is that I am trying to allocate an "lbmbt_rcv_binding_t" structure, but because of cut-and-past programming and being in a hurry, I used the size of a different (and smaller) structure: lbmbt_rcv_t.  Since the "signature" field is the last field in the structure, the assignments to it write past the end of the allocated memory block.  GAH!

But we all knew that C is a dangerous language, with its casting and sizeof just begging to be used wrong.  But could it be at least a *little* safer?


A Safer Malloc

#define VOL(_var, _type) (*(volatile _type *)&(_var))

/* Malloc "n" copies of a type. */
#define SAFE_MALLOC(_var, _type, _n) do {\
  _var = (_type *)malloc((_n) * sizeof(_type));\
  if (_var == NULL) abort();\
} while (0)
. . .
lbmbt_rcv_binding_t *binding = NULL;
. . .
SAFE_MALLOC(binding, lbmbt_rcv_t, 1);
memset(binding, 0xa5, sizeof(lbmbt_rcv_t));
binding->signature = SIGNATURE_OK;
. . .
VOL(binding->signature), int) = SIGNATURE_FREE;
free(binding);

The above code still has the bug -- it passed in the wrong type -- but at least it generates a compile warning.  The cast of malloc doesn't match the type of the variable.  Since I insist on getting rid of compiler warnings, that would have flagged the bug to me.


If you don't need portability, you can even use the gcc-specific extension "typeof()" in BOTH macros:

#define VOL(_var) (*(volatile typeof(_var) *)&(_var))

/* Malloc "n" copies of a type. */
#define SAFE_MALLOC(_var, _n) do {\
  _var = (typeof(*_var) *)malloc((_n) * sizeof(typeof(*_var)));\
  if (_var == NULL) abort();\
} while (0)
. . .
lbmbt_rcv_binding_t *binding = NULL;
. . .
SAFE_MALLOC(binding, 1);
memset(binding, 0xa5, sizeof(lbmbt_rcv_t));
binding->signature = SIGNATURE_OK;
. . .
VOL(binding->signature) = SIGNATURE_FREE;
free(binding);

Now the malloc bug is gone ... by design!


Safer Malloc and Memset

Finally, notice that the memset is also wrong.  Since I frequently like to init my malloced segments to a known pattern (0xa5 is my habit), I can define a second macro:

#define VOL(_var) (*(volatile typeof(_var) *)&(_var))

/* Malloc "n" copies of a type and set its contents to a pattern. */
#define SAFE_MALLOC_SET(_var, _pat, _n) do {\
  _var = (typeof(*_var) *)malloc((_n) * sizeof(typeof(*_var)));\
  if (_var == NULL) abort();\
  memset(_var, _pat, (_n) * sizeof(typeof(*_var)));\
} while (0)
. . .
lbmbt_rcv_binding_t *binding = NULL;
. . .
SAFE_MALLOC_SET(binding, 0xa5, 1);
binding->signature = SIGNATURE_OK;
. . .
VOL(binding->signature) = SIGNATURE_FREE;
free(binding);

No more bugs.


P.S.

It took me WAY longer than it should have to track this down (an entire weekend).  Why?  The bug manifested as a segmentation fault.  So I figured I could just valgrind it.  But lo and behold, I didn't have Valgrind on my Mac.  "No Problem," sez I.  I sez, "I'll just install it."

$ brew install valgrind
valgrind: This formula either does not compile or function as expected on macOS
versions newer than Sierra due to an upstream incompatibility.

GAH!  "Still no problem," sez I.  I sez, "I'll just debug it on Linux."

No seg fault.  Program worked fine.  Valgrind makes no complaint.  So I spent a the weekend doing a divide-and-conquer trackdown of the bug in a large codebase to find the "Mac-specific" bug.  And finally found the bad malloc.

But wait!  That's not Mac-specific!  What's going on here?  After wasting some more time, I finally printed the sizeof() for each structure type.  On Mac, the sizes are different.  On Linux, the structures are the same size.  Of course valgrind on Linux said it was OK -- the malloc size was correct on Linux!

And now I have Valgrind on my mac  which pinpointed the bug immediately.  (Thanks Güngör Budak!).

Wednesday, September 27, 2017

Is a lot of spam our own damn faults?

I got an unsolicited sales inquiry from a major company the other day.  Each day, 10 to 20 junk emails make it through our spam filter.  Usually, I can delete them after only a second or two, but this one sounded like I might already have a business relationship with them.  I don't want to risk insulting a customer or vendor, so I responded, asking what it was about.  The salesman was honest; he said that he thought somebody with my title would be interested.  I wasn't.  Not even close.

I've been on the Internet since it became easy to get on it.  When did it become acceptable to send blind solicitations?  When did the word "spam" come to mean only Nigerian princes and phishing schemes?  It used to be only desperate, border-line ethical, fly-by-night companies that sent junk email.  Now it's Box, Oracle, Microsoft, hell, I'm pretty sure my own employer does it!  Why have mainstream companies sunk so low as to send solicitations based on title?

Think back (if you're old enough) 20 years.  There were trade magazines that you could get "for free".  All you had to do is fill out a sheet that indicated in fair detail what your interests were, what industry you worked in, and the kinds of products over which you have purchase influence.  Vendors got very precisely-targeted lists, and we all knew that we would be getting solicitations.  We valued the magazine, so we didn't resent the ads.  Heck, although I don't remember specifically, I suspect I responded positively to one or two solicitations; the advertiser got their money's worth and I got a product that I wanted.

Those magazines don't exist any more, or at least not in my field.  We've all stopped reading the paper versions and instead look to the web for the information we're interested in.  We subscribe to blogs,  podcasts, slash-dot, LinkedIn groups, and any number of other curated content providers.  But the Internet evolved from an early non-commercial birth.  Early adopters resented the commercialization of the Internet, and refused to give information about themselves.  We create throw-away email addresses to subscribe.  We want to remain anonymous.  So the information curators never established the model of "you tell me about yourself for marketing purposes, and I'll give you information you want."  Some companies tried to get that going, but the internet "culture" prevented it from catching on.

So guess what?  I and my fellow-junk-email-haters are suffering from the unintended consequences of our own behavior.  Vendors no longer have precisely-targeted lists available to them.  So they substitute quantity for quality; send a million emails, and you're sure to find some prospects.  It's the new normal.

Idealists like me want a total paradigm change.  We want unsolicited advertisements to go away completely.  Back in the day, if I knew I wanted a C compiler, what did I do?  Open the yellow pages?  Sorry, no entries in the Yellow Pages for C compilers.  No, I *depended* on those trade magazines' advertisers to give me access to vendors of C compilers.  But now that search engines exist, we can do away with outgoing advertisements.  Instead of push marketing, go with pull marketing.  If I want a C compiler, I won't open my "junk" folder to find an unsolicited ad, I'll do a web search.  And this model *does* work!  We put some useful information on our web site, and attracted more than one customer who came for that information and stayed for our product.

And yet, the realist in me knows that human nature is what it is.  Research has proven again and again that advertising works.  I suspect modern email campaigns generate a lot of "unsubscribe me" responses, some of which may be less than polite, but I also suspect that they generate at least some interest.  Cast a wide-enough net and you'll catch some fish.

So if I have an emotional response to junk mail that is out of proportion to it's actual cost to me, that's my problem, not the advertisers.  I guess I need to get over it.

Thursday, September 21, 2017

Solaris Multicast Deafness Bug

Once again, the mighty Dave Zabel (of two different fames) has found another Multicast-related bug, this time in Solaris.  I think that recent versions of Solaris fix it, and I don't have the energy to track down *when* they fixed it, but if you have Solaris servers that you haven't kept updated for a while, you might have this bug.


DEAFNESS DEMONSTRATED

You'll need Informatica's "mtools" package for Solaris.  These are great tools offered for free in both binary and source form at https://marketplace.informatica.com/solutions/informatica_mtools

And you'll need two hosts: A and B.  Host B should be Solaris 6.10 that hasn't been updated in a long time.  Host A can be anything.

1. On host A, run this:
    msend 239.0.3.13 12000 15

2. On host B, open two windows.  In the first, enter this:
    mdump 239.0.3.13 12000

Admire the printouts of the multicast packets for a while.  Isn't technology wonderful?  :-)

3. In a second host B window, enter:
    mdump 239.128.3.13 12000

Note that the first window continues to print, but the second window is silent.  No surprise; it is listening to a different and unused multicast group!  Of course it is silent.

4. Kill that second mdump.

WHOA!  The first mdump stops printing!  It went deaf to 239.0.3.13.  When trying this same experiment on Linux, or on our Solaris 5.11 machines, it does not go deaf.  But we have several old, non-updated 5.10 machines where the first mdump does go deaf on this step.

5. Enter:
    netstat -g

The OS still thinks it is listening to the multicast group.

6. Enter:
    snoop -P host 239.0.3.13

The packets are still being received!  But they aren't being delivered to the first mdump.

7. Enter:
    mdump 239.128.3.13 12000

WHOA!  The first mdump starts printing again!  The second mdump is still silent since there still isn't any traffic on its multicast group.


MAYBE IT'S MY PROBLEM?

Maybe PEBKAC?  Or a bug in mtools?

Nope.  Let's start over and try it again with a small change in step 3:

1. On host A, run this:
    msend 239.0.3.13 12000 15

2. On host B, open two windows.  In the first, enter this:
    mdump 239.0.3.13 12000

3. In a second host B window, enter:
    mdump 239.64.3.13 12000

See what I did there?  I changed the 128 to 64.  As before, the first window continues to print, but the second window is silent.

4. Kill that second mdump.

Lookie there!  The first mdump continues to print the messages.  No deafness.


WHAT'S GOING ON?

Well, I'm not sure, but I think it's got to be related to multicast group aliasing.  Remember that there are 2**28 different IP multicast groups.  But what about Ethernet?  There are only 2**23 Ethernet multicast MAC addresses allocated for use by IP multicast.  It turns out that 239.0.3.13 and 239.128.3.13 map to the same Ethernet multicast MAC address: 01-00-5E-00-03-13.

The IGMP protocol doesn't care about that; host B still tells the switch which multicast groups are subscribed, and it treats 239.0.3.13 and 239.128.3.13 as different.  But when the IP layer interfaces with Ethernet, it needs to program the NIC with the same multicast MAC address for those two IP groups.  And apparently older versions of Solaris didn't do the book keeping right.

I've tried this experiment on other OSes and they all work as you would expect (no deafness).  Our Solaris 5.11 machine does it right.  And even a recently-installed 5.10 system works right.  But older systems that haven't been updated in a while all have this problem.



THE MORAL OF THE STORY

The obvious moral is to update your systems.

But even then, you should avoid using multicast groups that alias on top of each other.  The whole point of multicast is that you don't receive packets that you aren't interested in.  But if you have traffic published to both 239.0.3.13 and 239.128.3.13,  a host subscribing to only one of them will get data for both.  The IP layer will do the right thing (discard the undesired packets), but it still produces an unnecessary load.


ANY OTHER GOTCHAS?

Sure.  Watch out for well-known and ad-hoc multicast protocols in the range 224.0.0.0 - 224.4.255.255.  Are any of those in use anywhere on your network?  No?  Are you sure they never will be?

Look at the multicast group we tested with: 239.0.3.13.  That aliases on top of 224.0.3.13, which is in an ad-hoc1 range labeled "RFE Generic Service".  I don't know what that is (and Google doesn't seem to know either), but I'm thinking I want to avoid aliasing, even if low probability.

You should be fine if you use multicast groups between 239.0.5.0 - 239.127.255.255.

Oh, and update your systems too.  Good hygiene and all that.


UPDATE: UPGRADING FIXES IT

We've upgraded one of our "problem servers" to the latest Solaris 5.11 and it fixed the deafness problem.

I'm not interested in figuring out exactly which minor release they fixed it in.

Saturday, July 8, 2017

Most Random Password Generators are Bad

Good for you!  You're taking the advice of experts and clicking "Generate Password", resulting in 10 characters of gibberish.  There!  Now your password will take thousands of years to crack.

Um ... not necessarily.  Try a few days.


RAND() LIMITS ENTROPY TO 32 BITS

When using the pseudo-random number generator supplied by most language libraries, the entropy of the resulting password is limited to 32 bits!

Let's take XKCD's algorithm: ~2000 word dictionary, randomly select 4 words, produces 2000**4 different possible passwords, which is 16 trillion.  Log base 2 gives 43.9 bits of entropy.

But using a pseudo-random number generator with a 32-bit initial seed means that it will only generate 2**32 different sequences, or 4 billion.  That's .027% of the total!  In other words, 99.97% of the possible XKCD-style passwords CANNOT BE GENERATED by that program!

Normally, you can add more bits of entropy by either expanding the dictionary size (the number of words to choose from), or increasing the number of words in the password.  But because of the pseudo-random number generator, you are STUCK at 32 bits of entropy.  An attacker could even pre-generate the 4 billion possible XKCD-style passwords that a standard Linux rand() produces.

My point is not that 32 bits bits of entropy aren't enough, it's that you aren't necessarily going to get what you think you're getting if you use the stock pseudo-random number generator.

So if you're running somebody's application and you click "generate random password" and you see a string of gibberish that claims to have a crack time of thousands of years, it is probably wrong.  32 bits of entropy at 1000 guesses per second has a brute force crack time of under 50 days.  (And modern crackers go MUCH faster than 1000/sec.)


TRULY RANDOM NUMBERS ELIMINATE THE LIMIT

For my program, I offer the "-r" option, which reaches out to https://random.org to get random numbers.  It doesn't need very many -- you only need 4 random numbers to generate an XKCD-style password -- but the important feature is that random.org is truly random.  There is no seed.  Each number is uniformly random and independent from the previous number.  (Or at least, so claims the owner of random.org.)  I'm pretty sure this removes any artificial limit on entropy, so you can get as much entropy as you want by increasing the dictionary size and/or the number of words in the password.

Using random.org is not the only way to solve this problem.  Have you ever generated an SSL certificate?  It can take several seconds while the software "generates" enough entropy for long key lengths.  I'm not personally familiar with how that is done, but I've heard the the OS uses external physical events, like keystrokes, network interrupts, etc.  I think I've heard that it also uses disk interrupts, which makes me wonder if SSD drives make it harder for kernels to generate entropy.

If you're going to be demanding a lot of entropy for your application, you should not abuse random.org.  Instead learn how to use locally-generated entropy.

The vast majority of on-line password generators are written in Javascript.  I'm not sure how to get truly random numbers (i.e. entropy) in Javascript, but this might be a good starting point.

(By the way, a bit of reading on my part shows me that I have a lot to learn.  But my reading to date does reinforce my primary point: simply using rand() or a similar/derived function does not produce passwords that take thousands of years to crack.  At best they rely on "security through obscurity".)


PASSWORD MANAGERS' RANDOMIZER???

I'm a little worried about the "random password generators" included in password managers.  The idea is that you should have a different password for every on-line account, and you let the password manager deal with the hundreds of passwords you end up with.  Since you don't need to remember, or even type those passwords, you might as well make them be random character gibberish.  Only the password manager's master password needs to be memorized.

However, if your password manager just uses the normal pseudo-random number generator in the system, that sequence of random characters will not have as much entropy as you think.  I can tell you that LastPass's online password generator just uses Javascript's get_random() function, which only has 32-bits of entropy.  Now maybe their laptop application uses /dev/random, but also maybe the fact that their on-line generator uses built-in random indicates they didn't give the issue much thought.

I haven't done an exhaustive search, but I would wager that 90% of "generate password" functions just use the language's default random number generator, which has a 32-bit seed (or less!).

My suggestion is to use https://www.random.org/passwords/ to generate your gibberish passwords, or my program to generate XKCD-style passwords.


SEED V.S. PERIOD

The rand() man page says that Linux rand() uses the same algorithm as random().  And srandom()'s man page says that the seed is an unsigned int, which is 32 bits.  It also says:
The period of this random number generator is very large,
approximately 16 * ((2^31) - 1).
I.e. the period is approximately 34 billion, which is about 35 bits.  But the seed is 32 bits.  This means that you cannot start the random number generator at any arbitrary point in its period.  Even if you figure out a way to fully-leverage all 35 bits of random()'s period, that still gives you a crack time of 397 days, at 1000 guesses/sec.  And by the way, modern password crackers go much faster than 1000/sec.

XKCD-style Password Generator

I got to thinking about passwords again today.

I wrote my own program to produce XKCD-style passwords from a list of 2126 common words, and calculated some stats.  I reproduced XKCD's calculation of 44 bits of entropy for 4 randomly-selected words.  And I made a few mildly-interesting discoveries, and one more-interesting realization.


PASSWORD LENGTH: MORE = BETTER?

My average XKCD-style password length is 20.8 letters (over a large sample), which is a lot of typing.  So I decided to limit word size.  By filtering my list of 2129 words to nothing longer than 4 letters, I ended up with 709 words.  That's not many, and 4 of them together only gives 37 bits of entropy.  Not so hot.  But if you string 5 of 709 words together, you get 47 bits of entropy, which is better than XKCD!  And the average password drops to 18.3 characters.

I find that interesting: shorter passwords which produce more bits of entropy than longer passwords.  Seems counter-intuitive, until you realize that opening it up to the full 2129 words increases average word length more than it increases entropy.  (See below for the math.)


THE PASSWORDS

So, what do these passwords look like?  Here's 10 of the XKCD-style: 4 words from the 2126 word set:

password: MostlyRelativeSpinAdvanced
length: 26
password: ForBasicallyThinkingExplain
length: 27
password: CookieArmyMysteryConference
length: 27
password: ExpectConvertQuarterbackPresentation
length: 36
password: EverybodyProductHotDemonstrate
length: 30
password: RockIndexWellFloat
length: 18
password: BehaviorNearlyPromotePercentage
length: 31
password: PocketSurviveFourLab
length: 20
password: MuchWeekWillAnd
length: 15
password: DivideMorePeakSeveral
length: 21


So, how easy are those to remember?  Memorizing an XKCD-style password is about creating a mental picture or story around it.  Use some imagination.  It usually helps to make it amusing.  How about the first one: "MostlyRelativeSpinAdvanced"?  Well, I'm a bit of a science geek, so this one makes perfect sense.  You have a particle stream, and most of the particles are moving at relativistic speeds.  So measuring each particle's spin is a pretty advanced thing to do.  Hmm ... what's the amusing part of that?  Oh maybe that an actual physicist would roll his eyes at my explanation and say that I don't know the first thing about particle physics.  But basically, I was able to imagine a mini-story or mental picture for each of those passwords, so while I might not be able to memorize all 10, I could easily memorize one of them.

What about the shorter passwords consisting of 5 words from the 709 words of 4 letters or less?

password: BuyWingSadRideSeed
length: 18
password: SeedPairTankJailDo
length: 18
password: PanBuryDenyDataOld
length: 18
password: GeneRiceTeaYetSin
length: 17
password: WallJailLabNextTent
length: 19
password: HallSnapCashRichRead
length: 20
password: WarmUsKeepRoseLess
length: 18
password: PortMarkSirYouLeaf
length: 18
password: HiAgoHipAnyBe
length: 13
password: EaseSkyRealTossFate
length: 19


Even though those are shorter (and more secure) passwords, I guess I find them more difficult to remember them.  It's about creating a mental picture or story around those words.  Since the words are random, they don't come out in any conceptually correlated way.  So you stretch your imagination to encompass them.  The more words in the password, the more you have to stretch.

Take the last password up there, "EaseSkyRealTossFate", and drop that last word to make it 4 words: "Ease sky real toss".  My first thought is that "toss" is the children's game "ring toss".  Sky and ease kind of fit since the game is usually pretty easy and you toss things towards the sky.  The word "real" is kind of left out, but I imagine throwing something "real", like a laptop or a dinner plate, instead of a game piece.  So I imagine the ease of tossing a real laptop into the sky.  Yeah, that's stretching the imagination a bit, but maybe not too much.  I could pretty easily remember EaseSkyRealToss.

But now throw "fate" in there and my whole mental picture falls apart.  I guess I could say that when the laptop lands, its fate will be sealed, but ... not sure why ... but I would have much more trouble remembering it.

So I'll be sticking to 4 words and more typing.


THE MATH

Passwords are basically taking a set of N things, and taking L of them out with replacement.  For example, a 4-digit PIN consists of a set of 10 digits (N=10), and you take 4 digits out (L=4) with replacement.  The "with replacement" simply means that you might take the same digit out more than once (e.g. 2338).  So the entropy of a 4-digit PIN is 10**4 *(10 to the power of 4), which is 10,000.  To get that in terms of bits, take the log base 2 of it to get 13.  So 13 bits of entropy.

Another example: 8 randomly-selected letters for a password.  Let's assume lower-case only, and no digits or special characters.  The set of N things is the letters of the alphabet, so N=26.  By taking 8 characters, L=8.  26**8 = 208 billion.  Log base 2 of that is 37 bits of entropy.  Cool.  Now let's do random upper/lower case.  N=52, and 52**8 = 53 trillion, giving 45.6 bits of entropy.  Add in 0-9: N=62, 62**8 = 218 trillion, giving 47.6 bits of entropy.

So, back to my XKCD-style passwords.  My original set of 2126 words, taking 4 at a time, gives 2126**4 = 20 trillion, which is 44.2 bits.  My reduced set of 709 short words, taking 5 at a time gives 709**5 = 179 trillion, which is 47.3 bits.

However, see my next blog post for an observation about random password generators and entropy.


MORE WORDS?

My list of 2126 words actually comes from a list of 3000 words from Education First.  I filtered it to limit word length to 7 or fewer characters, resulting in my 2126 words.  Note to the rigorous: you'll find that I'm 2 words short; it made my code easier to ignore the first and last words.

So how about if I remove that filter and pick 4 words from the entire set of 2998?

2998 ** 4 = 80 trillion, which is 46 bits.  I.e. going from 2126 words to 2998 increases the entropy by 2 bits.  My average password length jumps to 25.6, which is 5 more characters.  I tried a few other word length limits and decided that 7 is best.


THE PROGRAM

See https://github.com/fordsfords/pgen Be sure to use "-r" if generating an actual password you want to use.

Monday, May 22, 2017

Some multicast programming tips

Never too old to learn.  :-)

There are lots of multicast example programs out there, so I won't try to compete with them.  But I did run across several things that weren't explained very well.


Single Socket, Multiple Groups

Yes, you can create a single socket and have it receive datagrams from multiple multicast groups.  Just include multiple calls to:
  setsockopt(recv_sock, IPPROTO_IP, IP_ADD_MEMBERSHIP, ...


Multiple Sockets, One Group per Socket

This is another common use case, where you create multiple sockets for receiving, with each socket joined to a different multicast group.


Binding the Receive Socket

Since a socket needs to be bound to a port to receive any kind of UDP datagram, multicast or unicast, you need to include a call to bind().  You pass in a sockaddr_in with the sin_port set as desired (remember to pass it in network order).  But what about the sin_addr?  What do you set that to?

Many people set it to INADDR_ANY, which is what I did in a recent program.  But in the multiple sockets, different group per socket case, it had an unexpected side effect.  All of my sockets were bound to the same destination port, but joined to different multicast groups.  With sin_addr set to INADDR_ANY, the kernel took each received datagram, replicated it, and delivered a copy to *every* socket, even if the datagram's destination group is different from the one joined to the stocket! I.e. simply doing the IP_ADD_MEMBERSHIP on a socket didn't filter datagrams based on the desired group.  When a multicast datagram was received, the kernel just used the destination port and delivered a copy to every UDP socket bound to that port and INADDR_ANY.

I had to do some extra searching to find out that you can set the bind's sin_addr to the multicast group.  I have some reason to suspect that this is not portable across all operating systems, but at least it works on Linux.  Now I can have 10 sockets, each bound to the same port (don't forget SO_REUSEADDR) but different multicast groups.  When a multicast datagram is received, it is delivered *only* to the socket which is bound to the right port/multicast group pair.


Single Socket, Multiple Groups, reprise

So, what about the case where you have a single socket joined to multiple groups?  In that case, you *do* want to use INADDR_ANY in the bind.


Mix and Match?

I guess this poses a restriction.  You can't have, say, 2 sockets that you distribute 4 multicast groups across, with two groups each.  Why would you want to do that?  Maybe to load-balance across threads.  But assuming they all want to bind to the same port, you can't do it.  Setting the sin_addr to INADDR_ANY prevents filterig, and will mean that both sockets will receive a copy of every datagram sent. But you can't set sin_addr to multiple multicast groups.

So if you want to have multiple sockets, multiple groups, and the same destination port, you need to have one group per socket, and bind that socket to the group.

Monday, May 15, 2017

WannaCrypt / WannaCry ransomeware

I'm not a security researcher, and I don't follow the subject very closely.  But here is an interesting read by the person who slowed the spread of the recent WannaCrypt / WannaCry ransomware outbreak.

https://www.malwaretech.com/2017/05/how-to-accidentally-stop-a-global-cyber-attacks.html

Sunday, April 30, 2017

Fraudulent spam email claiming to be Netflix

I got a phishing email.  So what?  I get lots of phishing emails.  Why blog about this one?

Well, it's at least a *little* different.

Most of them direct the victim to an existing web site which has been compromised.  I.e. the web site's real owner has no idea that his own site is being used for fraudulent purposes.

In this one, the victim is directed to the domain name "netflix-myaccount.com", which the scammer obtained properly.  Unfortunately, the scammer wasn't stupid enough to include his own contact information in the registry, instead choosing to hide behind privacyprotect.org.

Now there's nothing wrong with using privacyprotect.org to hide one's identity.  If anything, it removed any doubt in my mind (as if there were any) that the page isn't owned by Netflix.  So it reinforced that it is a phishing site.  I sent a complaint email to privacyprotect.org anyway.

Next up, the domain the registry: ilovewww.com.  Never heard of them.  Malaysian.  Sent them a complaint email too to suspend the registration.

Next, the IP address that netflix-myaccount.com resolves to: 80.82.67.155.  A whois lookup shows the block is owned by Quasi Networks LTD.  Abuse email to it as well.

Now to another nice site: phishcheck.me, a site that evaluates how likely a site is to be fraudulent.  It actually goes to the site and analyzes it.  So I went there and plugged in "http://netflix-myaccount.com", and sure enough, it says that it is probably a phishing site (no surprise there).  But on that phishcheck.me page is a tab named "resources", which shows details of the access to the site ... and well lookie there, "netflix-myaccount.com" redirects to "netflix-secureserver.com".  Which resolves to the same IP as "netflix-myaccount.com", and is registered in the same ways (ilovewww.com and privacyprotect.org).  So what the point in that?  Oh well, another set of complaint emails for the new domain name.

Finally, let's see if it is a compromised web site.  I would like to see what other domain names resolve to the same IP address.  Unfortunately, this appears not to be an exact science.  The few sites there are that claim to do this find *no* domains resolving to that IP.  However, a simple google search for "80.82.67.155" (*with* the double quotes) does find the names "netflix-myaccount.com" and a new one: "www.useraccountvalidation-apple.com".

Yep.  Another phishing site, leveraging Apple instead of Netflix.  Let's do the drill, starting with whois.  WHOA!!!  Did we hit paydirt?

Registrant Contact
Name: Jamie Wilson
Organization:
Mailing Address: 22 Madisson Road, London London SE12 8DH GB
Phone: +44.07873394485
Ext:
Fax:
Fax Ext:
Email:uktradergb@gmail.com

Now, don't be too hasty.  The *real* registrant is a scammer.  What are the chances he would list his own real contact info?  The only thing that might be valid is the email address, since I think he needs that to fully set up the domain, and even then it might have been a single-use throwaway.

Hmm ... not totally throw-away.  A google of "uktradergb@gmail.com" has 6 hits, including "netflix-iduser1.com" and "netflix-iduser2.com", both of which have Jamie as the registrant, but neither of which resolve to valid IP addresses.  So not sure there's anything actionable (i.e. complainable) there.

But just in case, I googled the phone number, and found this additional hit: "AppleId1-Cgi.com", which doesn't appear to resolve to a valid IP.

Well, much as I hate to, let's skate over to "domaintools.com", which wants my money in a bad way.  It tells me that uktradergb@gmail.com is associated with ~38 domains, but of course won't tell me what any of them are without paying them $99.  And even though I would love to send complaints regarding all 38, I wouldn't love it $99 worth.

Ok, one more thing.  http://domainbigdata.com/gmail.com/mj/LX7iN6iKwKFIRfkD7CsKXQ says that the owner of that email address is Adam Stormont, and that the email is associated with a few other sites (but not 37), including "hmrc-refundvalidation.com", which doesn't resolve to an IP.  And by the way, a whois of another uktradergb@gmail.com domain, "hni-4.com", says that the registrant is David Hassleman.  So yeah, ignore the Jamie Wilson contact.  He wasn't that stupid.  :-)

And now I've run out of gas.  Maybe those domain names will be disabled in the next few days.  Or maybe I've just wasted a half hour of my life.  (Well, I've learned a few things, so not totally wasted.)