Solaris中如何检测内核代码的内存泄漏
By judy on 十一月 30, 2007
本文将以一个驱动程序(tleak.c tleak.conf)为例说明如何利用mdb的::findleaks命令检测内核代码是否存在内存泄漏。
请注意,上一篇文章给的示例应用程序其内存泄漏发生在堆(heap)上,当程序退出的时候,堆随之被释放掉,所以并不会对系统造成影响。而本文提供的示例驱动tleak将在内核产生内存泄漏,所以请谨慎使用,不熟悉内核的朋友请不要在自己的机器上运行该驱动及以下步骤。(USE AT YOUR OWN RISK)
tleak是一个伪字符设备,每打开一次,会进行一次内存分配,则当第二次打开该设备的时候就会产生内存泄漏,主要函数tleak_open()定义如下:
tleak_open(dev_t /*devp, int flag, int otyp, cred_t /*credp)
{
return (0);
首先设置系统变量kmem_flags以使能核心内存分配(kernel memory allocator)的调试功能,这些功能在缺省情况下是被禁止的。为此在/etc/system中加入行:
Loading modules: [ unix krtld genunix specfs dtrace cpu.AuthenticAMD.15 uppc pcplusmp ufs ip sctp usba uhci s1394 nca fcp fctl lofs zfs random audiosup md cpc crypto fcip logindmux ptm sppp nfs ]
> kmem_flags/X
kmem_flags:
kmem_flags: f
其次编译、安装驱动程序tleak。
$ ld -dy -r -o tleak tleak.o
$ cp tleak /kernel/drv/
$ cp tleak.conf /kernel/drv/
$ add_drv tleak
add_drv将自动加载驱动程序,用modinfo检查一下
$ modinfo | grep tleak
194 fa15bb04 484 205 1 tleak (Test kernel memory leak v0.1)
在/devices下生成了设备文件/devices/pseudo/tleak@0:tleak。多次运行cat打开设备以产生内存泄漏
强制系统coredump,同时重启机器
Loaded modules: [ audiosup crypto cpc uppc ptm ufs unix zfs krtld s1394 sppp ipcnca uhci lofs genunix ip logindmux usba specfs pcplusmp nfs md random sctp cpu.AuthenticAMD.15 ]
[0]> %CONTENT%lt;systemdump
等机器重新启动后,用mdb调试上一步生成的核心core文件
$ ls
bounds unix.0 vmcore.0
$ mdb -k 0
Loading modules: [ unix krtld genunix specfs dtrace cpu.AuthenticAMD.15 uppc pcplusmp ufs ip sctp usba uhci s1394 nca fcp fctl lofs zfs random audiosup md cpc crypto fcip logindmux ptm sppp nfs ]
> ::status
debugging crash dump vmcore.0 (32-bit) from mars
operating system: 5.11 snv_34 (i86pc)
panic message:
BAD TRAP: type=e (#pf Page fault) rp=d4e7cdb8 addr=0 occurred in module "" due to a NULL pointer dereference
dump content: kernel pages only
> ::findleaks
CACHE |
LEAKED |
BUFCTL |
CALLER |
dac2e6f0 |
2 | d3f14980 | AcpiOsAllocate+0x15 |
dac2e6f0 | 5 | d3f20c40 | AcpiOsAllocate+0x15 |
dac2e6f0 | 1 | d3f14ae8 | AcpiOsAllocate+0x15 |
dac2e6f0 | 1 | d3f1e618 | AcpiOsAllocate+0x15 |
dac2e6f0 | 7 | d3f20cb8 | AcpiOsAllocate+0x15 |
dac2e6f0 | 2 |
d3f20b50 | AcpiOsAllocate+0x15 |
dac32030 | 1 |
d4ec7748 | tleak_open+0x35 |
--------- |
--------- |
--------------- |
------------------------- |
Total |
19 |
buffers, |
976 bytes |
> d4ec7748%CONTENT%lt;bufctl_audit
ADDR |
BUFADDR |
TIMESTAMP |
THREAD |
CACHE |
LASTLOG |
CONTENTS |
|
d4ec7748 |
d4db0300 |
a1397b121b |
d64db340 |
dac32030 |
db0f0628 |
dbb62e98 |
|
kmem_cache_alloc_debug+0x256 |
|||
kmem_cache_alloc+0x97 |
|||
kmem_zalloc+0x4b |
|||
tleak_open+0x35 |
|||
dev_open+0x27 |
|||
spec_open+0x3cc |
|||
fop_open+0x6e |
|||
vn_openat+0x42a |
|||
copen+0x287 |
|||
open64+0x20 |
ADDR |
BUFADDR |
TIMESTAMP |
THREAD |
CALLER |
---------- |
----------- |
------------- |
----------- |
------------------- |
db2bebf8 |
d4db0380 |
a49a0fccba |
d64db340 |
tleak_open+0x35 |
db0f0628 |
d4db0300 |
a1397b121b |
d64db340 |
tleak_open+0x35 |
db0bc394 |
d51b3380 |
9f58e81dab |
d64db340 |
tleak_open+0x35 |
另外mdb的::kmem_verify可以用来检测内存异常(如越界访问)。这时mdb提供了丰富的命令和宏,使用户可以方便地得到坏内存被哪些线程访问过。如:
d4db0300 is d4db0300+0, bufctl d4ec7748 allocated from kmem_alloc_112
::bufctl -a用buffer地址过滤内存分配日志。该例中此内存仅被tleak_open()访问过
> ::walk kmem_log | ::bufctl -a d4db0300
ADDR BUFADDR TIMESTAMP THREAD CALLER
db0f0628 d4db0300 a1397b121b d64db340 tleak_open+0x35
::kgrep搜索对指定buffer的引用
> d4db0300::kgrep | ::whatis -a
db0f062c is dac43000+4ad62c (vmem_seg dac11168) from kmem_log vmem arena
db0f062c is dac43000+4ad62c (vmem_seg dac11258) from heap vmem arena
d4ec774c is d4ec7748+4, allocated from kmem_bufctl_audit_cache
d4ec774c is d4ec7000+74c (vmem_seg d4ea9ac8) from kmem_msb vmem arena
d4ec774c is d4ec7000+74c (vmem_seg d4ea9bb8) from kmem_metadata vmem arena
d4ec774c is d4ec4000+374c (vmem_seg d4ea6d20) from heap vmem arena
d504693c is d5046920+1c, allocated from kmem_magazine_7
d504693c is d5046000+93c (vmem_seg d4eb98e8) from kmem_msb vmem arena
d504693c is d5046000+93c (vmem_seg d4eb99d8) from kmem_metadata vmem arena
d504693c is d5044000+293c (vmem_seg d4eb66f8) from heap vmem arena