Monday, August 27, 2018

Safer Malloc for C

Here's a fragment of code that I recently wrote.  See anything wrong with it?

#define VOL(_var, _type) (*(volatile _type *)&(_var))
. . .
lbmbt_rcv_binding_t *binding = NULL;
. . .
binding = (lbmbt_rcv_binding_t *)malloc(sizeof(lbmbt_rcv_t));
memset(binding, 0xa5, sizeof(lbmbt_rcv_t));
binding->signature = SIGNATURE_OK;
. . .
VOL(binding->signature), int) = SIGNATURE_FREE;
free(binding);

No, the problem isn't in that strange VOL() macro; that is explained here.

The problem is that I am trying to allocate an "lbmbt_rcv_binding_t" structure, but because of cut-and-past programming and being in a hurry, I used the size of a different (and smaller) structure: lbmbt_rcv_t.  Since the "signature" field is the last field in the structure, the assignments to it write past the end of the allocated memory block.  GAH!

But we all knew that C is a dangerous language, with its casting and sizeof just begging to be used wrong.  But could it be at least a *little* safer?


A Safer Malloc

#define VOL(_var, _type) (*(volatile _type *)&(_var))

/* Malloc "n" copies of a type. */
#define SAFE_MALLOC(_var, _type, _n) do {\
  _var = (_type *)malloc((_n) * sizeof(_type));\
  if (_var == NULL) abort();\
} while (0)
. . .
lbmbt_rcv_binding_t *binding = NULL;
. . .
SAFE_MALLOC(binding, lbmbt_rcv_t, 1);
memset(binding, 0xa5, sizeof(lbmbt_rcv_t));
binding->signature = SIGNATURE_OK;
. . .
VOL(binding->signature), int) = SIGNATURE_FREE;
free(binding);

The above code still has the bug -- it passed in the wrong type -- but at least it generates a compile warning.  The cast of malloc doesn't match the type of the variable.  Since I insist on getting rid of compiler warnings, that would have flagged the bug to me.


If you don't need portability, you can even use the gcc-specific extension "typeof()" in BOTH macros:

#define VOL(_var) (*(volatile typeof(_var) *)&(_var))

/* Malloc "n" copies of a type. */
#define SAFE_MALLOC(_var, _n) do {\
  _var = (typeof(*_var) *)malloc((_n) * sizeof(typeof(*_var)));\
  if (_var == NULL) abort();\
} while (0)
. . .
lbmbt_rcv_binding_t *binding = NULL;
. . .
SAFE_MALLOC(binding, 1);
memset(binding, 0xa5, sizeof(lbmbt_rcv_t));
binding->signature = SIGNATURE_OK;
. . .
VOL(binding->signature) = SIGNATURE_FREE;
free(binding);

Now the malloc bug is gone ... by design!


Safer Malloc and Memset

Finally, notice that the memset is also wrong.  Since I frequently like to init my malloced segments to a known pattern (0xa5 is my habit), I can define a second macro:

#define VOL(_var) (*(volatile typeof(_var) *)&(_var))

/* Malloc "n" copies of a type and set its contents to a pattern. */
#define SAFE_MALLOC_SET(_var, _pat, _n) do {\
  _var = (typeof(*_var) *)malloc((_n) * sizeof(typeof(*_var)));\
  if (_var == NULL) abort();\
  memset(_var, _pat, (_n) * sizeof(typeof(*_var)));\
} while (0)
. . .
lbmbt_rcv_binding_t *binding = NULL;
. . .
SAFE_MALLOC_SET(binding, 0xa5, 1);
binding->signature = SIGNATURE_OK;
. . .
VOL(binding->signature) = SIGNATURE_FREE;
free(binding);

No more bugs.


P.S.

It took me WAY longer than it should have to track this down (an entire weekend).  Why?  The bug manifested as a segmentation fault.  So I figured I could just valgrind it.  But lo and behold, I didn't have Valgrind on my Mac.  "No Problem," sez I.  I sez, "I'll just install it."

$ brew install valgrind
valgrind: This formula either does not compile or function as expected on macOS
versions newer than Sierra due to an upstream incompatibility.

GAH!  "Still no problem," sez I.  I sez, "I'll just debug it on Linux."

No seg fault.  Program worked fine.  Valgrind makes no complaint.  So I spent a the weekend doing a divide-and-conquer trackdown of the bug in a large codebase to find the "Mac-specific" bug.  And finally found the bad malloc.

But wait!  That's not Mac-specific!  What's going on here?  After wasting some more time, I finally printed the sizeof() for each structure type.  On Mac, the sizes are different.  On Linux, the structures are the same size.  Of course valgrind on Linux said it was OK -- the malloc size was correct on Linux!

And now I have Valgrind on my mac  which pinpointed the bug immediately.  (Thanks Güngör Budak!).