Thursday, June 5, 2014

clang/llvm optimize -O3 understands free()


Wow, I just found an amazing bug, although I'm not "bug" is the right word for it.

I have an object delete function:

void e_del(e_t *e)
{
    if (e == NULL || e->signature != E_SIGNATURE) abort();
    e->signature = 0;
    free(e);
}

I have int signature at the end of the e_t structure, set at object creation to a known value (E_SIGNATURE).  Then, I can check periodically to make sure that a valid object is being used.  This detects a variety of bugs, like writing past the end of a buffer in the e_t structure, or just an invalid pointer.  The setting of signature to zero in e_del() is to detect a double call to e_del() with the same object.

In my unit tests, I coded a double-delete to make sure it worked, but abort didn't get called!  Instead, it double-freed!

Silly me, I must have a stupid error in my logic.  Re-build without optimization to make it easier to set breakpoints and single step.  Works.  Works fine.  Without optimization, my code works.  With optimize level 3, it doesn't.  I generated assembly language for optimize and non-optimize cases, and guess what.  At optimize 3, the zero of signature is gone.  Completely gone.

I had a theory, which I tested by changing free(e) to printf("%p", e) and guess what.  At optimize 3, the zero of signature reappeared.  The code worked as expected.  Change printf back to free, and the zero of signature disappeared again.

Apparently, clang/llvm knows what free() is for.  It knows that only a buggy program would try to access a heap segment after it is freed.  So for any non-buggy (and non-multi-threaded) program, it is safe to throw away the "e->signature = 0" line.   My problem is that I DON'T want to assume a bug-free program.  I want to do a good job of detecting bugs.

How to fix?  A variation on Linux's ACCESS_ONCE macro which I named VOL:

#define VOL(_x, _type) (*(volatile _type *)&(_x))
. . .
VOL(e->signature, int) = 0;
free(e);

This "volatilizes" access to e->signature, forcing the optimizer to leave the operation alone.  (FYI - my reason for making my own macro is that ACCESS_ONCE uses the gcc-specific built-in "typeof()" to remove the need to pass in the data type.  But I didn't want my code to be gcc-specific.)

I've done a lot of multi-threaded programming in my day, most-recently experimenting with lock-free, non-blocking designs, so I've had to worry about the optimizer from time to time.  But this is the first time I've ever had a simple, single-threaded, straight-line code work wrong due to the optimizer.  Think about what this means.  The optimizer knows what free() is for, knows that the freed segment shouldn't be accessed after the free, and makes code generation decisions on that basis.  I wonder what else the optimizer thinks it knows...

UPDATE: in its first posting, mistakenly attributed this issue to gcc.  I forgot that my Mac now uses the clang/llvm compiler.  I tried this with true gcc, and it did NOT omit the clear of signature.

UPDATE2: I changed the assignment of zero to signature to a memset of the whole structure:

void e_del(e_t *e)
{
    if (e == NULL || e->signature != E_SIGNATURE) abort();
    memset(e, 0, sizeof(*e));
    free(e);
}

The compiler did NOT optimize out the memset.  I'm guessing that it thought the memset function might have side effects, which is interesting since memset is a compiler builtin.  In fact, if signature is the only field in e_t, the memset compiles to the exact same instructions as the zero assignment.

UPDATE3: Wow, optimizers are surprising.  I found a case where the memset() IS optimized away.  I tried telling the compiler that e points at volatile memory, and it STILL optimized it away.   I'm back to using VOL() on the individual fields.  Again, this is all clang/llvm on Mac.  I've never seen gcc optimize away code based on knowing what free() does.

No comments: