[lwn]Double kfree() errors

Double kfree() errors

Less than 24 hours after Coverity announced the availability of a new set of machine-detected potential kernel bugs, Dave Jones started posting fixes. Judging from these fixes, a number of the problems detected this time around are double-free errors - passing the same pointer to kfree() twice. Freeing memory twice is a sure way to corrupt core kernel data structures, leading to trouble in unpredictable places far from where the real bug is to be found. Avoiding this kind of error would make life easier for everybody involved.

To that end, Dave tossed out a simple idea: have kfree() poison pointers so that a second call can be detected immediately. His first proposal looked like this:

    #define kfree(foo) \
	        __kfree(foo); \
	        foo = KFREE_POISON;

This code was not meant to be incorporated as-is; for starters, it probably needs a pair of braces. But there were a couple of other problems which popped up. One of them is that, since passing a NULL pointer to kfree() is legal, passing it twice is also legal. But this code would break that case. Whether that would be a problem for real code is unclear. Al Viro pointed out a more serious issue: the pointer passed to kfree() is not always an lvalue which can be assigned to. So simply redefining kfree() in this way would lead to compilation errors.

The end result is that a transparent, in-place replacement for kfree() may be hard to implement. An alternative might be the creation of a safe_kfree() variant, combined with some serious pressure to use that variant. Then, perhaps, double-free errors could be caught when they happen.

Or, instead, one could use the double-free checking already built into the kernel. The slab allocator, which is (among other things) the engine behind kmalloc() and kfree(), has options for poisoning (writing special values to) all memory which it handles. One value (0x5a in every byte) marks uninitialized memory, while another (0x6b) is written into memory when it is freed. The resulting patterns jump out nicely in oops listings, often making the cause of the problem immediately obvious. But the use-after-free value can also enable the detection of double-free errors - assuming that the memory is not reallocated between kfree() calls.

The problem, it seems, is that not a whole lot of developers are running with slab poisoning enabled. As a result, they are working without a valuable debugging tool and allowing certain kinds of bugs to persist in the code base. So a part of the solution to the problem may well be a stronger effort to get developers to turn the slab poisoning option on. Beyond that, any sort of checking added to kfree() (or a variant) should be harder to disable than the existing debugging options.


(Log in to post comments)

Slab poisoning ain't cheap

Posted Mar 9, 2006 6:18 UTC (Thu) by bos (guest, #6154) [Link]

One reason why slab poisoning isn't used much is that it slows systems down drastically. For example, it's not unusual for it to cause netperf (a TCP performance benchmark) results to drop by 40% to 70%.

This sort of slowdown and the vast increase in bus traffic that goes with it can then cause other fun, such as masking the effects of race conditions.

Double kfree() errors

Posted Mar 9, 2006 15:57 UTC (Thu) by mbligh (subscriber, #7720) [Link]

Eh, no idea why I didn't fix this a year ago. I'll arrange with Andy to run a debug kernel with CONFIG_DEBUG_SLAB as well as a few other debug bits, and I can create a separate matrix on http://test.kernel.org for the debug runs - will fold it more nicely later. That should do every -git snapshot, and every -mm, etc across a variety of machines.

Double kfree() errors

Posted Mar 10, 2006 0:46 UTC (Fri) by davej (guest, #354) [Link]

A lot of problems we've seen in Fedora bugzilla from our slab debug kernels have been driver specific though (or out-of-tree junk which is another great argument for 'poison kfree in non-debug kernels'), so getting it into your nightly tests isn't really going to be the end-all of this class of bug.

I believe the future lies in tools like Coverity's checker, as it's the only way we can feasibly get complete coverage.

Double kfree() errors

Posted Mar 10, 2006 16:39 UTC (Fri) by mbligh (subscriber, #7720) [Link]

Ugh. yes, that's going to be much harder. Ah well, should give us something at least. Once we have the open harness, hopefully we can get a broader range of hardware in nightly test, but still ... won't cover everything.
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章