Solaris中如何檢測內核代碼的內存泄漏
By judy on 十一月 30, 2007
本文將以一個驅動程序(tleak.c tleak.conf)爲例說明如何利用mdb的::findleaks命令檢測內核代碼是否存在內存泄漏。
請注意,上一篇文章給的示例應用程序其內存泄漏發生在堆(heap)上,當程序退出的時候,堆隨之被釋放掉,所以並不會對系統造成影響。而本文提供的示例驅動tleak將在內核產生內存泄漏,所以請謹慎使用,不熟悉內核的朋友請不要在自己的機器上運行該驅動及以下步驟。(USE AT YOUR OWN RISK)
tleak是一個僞字符設備,每打開一次,會進行一次內存分配,則當第二次打開該設備的時候就會產生內存泄漏,主要函數tleak_open()定義如下:
tleak_open(dev_t /*devp, int flag, int otyp, cred_t /*credp)
{
return (0);
首先設置系統變量kmem_flags以使能核心內存分配(kernel memory allocator)的調試功能,這些功能在缺省情況下是被禁止的。爲此在/etc/system中加入行:
Loading modules: [ unix krtld genunix specfs dtrace cpu.AuthenticAMD.15 uppc pcplusmp ufs ip sctp usba uhci s1394 nca fcp fctl lofs zfs random audiosup md cpc crypto fcip logindmux ptm sppp nfs ]
> kmem_flags/X
kmem_flags:
kmem_flags: f
其次編譯、安裝驅動程序tleak。
$ ld -dy -r -o tleak tleak.o
$ cp tleak /kernel/drv/
$ cp tleak.conf /kernel/drv/
$ add_drv tleak
add_drv將自動加載驅動程序,用modinfo檢查一下
$ modinfo | grep tleak
194 fa15bb04 484 205 1 tleak (Test kernel memory leak v0.1)
在/devices下生成了設備文件/devices/pseudo/tleak@0:tleak。多次運行cat打開設備以產生內存泄漏
強制系統coredump,同時重啓機器
Loaded modules: [ audiosup crypto cpc uppc ptm ufs unix zfs krtld s1394 sppp ipcnca uhci lofs genunix ip logindmux usba specfs pcplusmp nfs md random sctp cpu.AuthenticAMD.15 ]
[0]> %CONTENT%lt;systemdump
等機器重新啓動後,用mdb調試上一步生成的核心core文件
$ ls
bounds unix.0 vmcore.0
$ mdb -k 0
Loading modules: [ unix krtld genunix specfs dtrace cpu.AuthenticAMD.15 uppc pcplusmp ufs ip sctp usba uhci s1394 nca fcp fctl lofs zfs random audiosup md cpc crypto fcip logindmux ptm sppp nfs ]
> ::status
debugging crash dump vmcore.0 (32-bit) from mars
operating system: 5.11 snv_34 (i86pc)
panic message:
BAD TRAP: type=e (#pf Page fault) rp=d4e7cdb8 addr=0 occurred in module "" due to a NULL pointer dereference
dump content: kernel pages only
> ::findleaks
CACHE |
LEAKED |
BUFCTL |
CALLER |
dac2e6f0 |
2 | d3f14980 | AcpiOsAllocate+0x15 |
dac2e6f0 | 5 | d3f20c40 | AcpiOsAllocate+0x15 |
dac2e6f0 | 1 | d3f14ae8 | AcpiOsAllocate+0x15 |
dac2e6f0 | 1 | d3f1e618 | AcpiOsAllocate+0x15 |
dac2e6f0 | 7 | d3f20cb8 | AcpiOsAllocate+0x15 |
dac2e6f0 | 2 |
d3f20b50 | AcpiOsAllocate+0x15 |
dac32030 | 1 |
d4ec7748 | tleak_open+0x35 |
--------- |
--------- |
--------------- |
------------------------- |
Total |
19 |
buffers, |
976 bytes |
> d4ec7748%CONTENT%lt;bufctl_audit
ADDR |
BUFADDR |
TIMESTAMP |
THREAD |
CACHE |
LASTLOG |
CONTENTS |
|
d4ec7748 |
d4db0300 |
a1397b121b |
d64db340 |
dac32030 |
db0f0628 |
dbb62e98 |
|
kmem_cache_alloc_debug+0x256 |
|||
kmem_cache_alloc+0x97 |
|||
kmem_zalloc+0x4b |
|||
tleak_open+0x35 |
|||
dev_open+0x27 |
|||
spec_open+0x3cc |
|||
fop_open+0x6e |
|||
vn_openat+0x42a |
|||
copen+0x287 |
|||
open64+0x20 |
ADDR |
BUFADDR |
TIMESTAMP |
THREAD |
CALLER |
---------- |
----------- |
------------- |
----------- |
------------------- |
db2bebf8 |
d4db0380 |
a49a0fccba |
d64db340 |
tleak_open+0x35 |
db0f0628 |
d4db0300 |
a1397b121b |
d64db340 |
tleak_open+0x35 |
db0bc394 |
d51b3380 |
9f58e81dab |
d64db340 |
tleak_open+0x35 |
另外mdb的::kmem_verify可以用來檢測內存異常(如越界訪問)。這時mdb提供了豐富的命令和宏,使用戶可以方便地得到壞內存被哪些線程訪問過。如:
d4db0300 is d4db0300+0, bufctl d4ec7748 allocated from kmem_alloc_112
::bufctl -a用buffer地址過濾內存分配日誌。該例中此內存僅被tleak_open()訪問過
> ::walk kmem_log | ::bufctl -a d4db0300
ADDR BUFADDR TIMESTAMP THREAD CALLER
db0f0628 d4db0300 a1397b121b d64db340 tleak_open+0x35
::kgrep搜索對指定buffer的引用
> d4db0300::kgrep | ::whatis -a
db0f062c is dac43000+4ad62c (vmem_seg dac11168) from kmem_log vmem arena
db0f062c is dac43000+4ad62c (vmem_seg dac11258) from heap vmem arena
d4ec774c is d4ec7748+4, allocated from kmem_bufctl_audit_cache
d4ec774c is d4ec7000+74c (vmem_seg d4ea9ac8) from kmem_msb vmem arena
d4ec774c is d4ec7000+74c (vmem_seg d4ea9bb8) from kmem_metadata vmem arena
d4ec774c is d4ec4000+374c (vmem_seg d4ea6d20) from heap vmem arena
d504693c is d5046920+1c, allocated from kmem_magazine_7
d504693c is d5046000+93c (vmem_seg d4eb98e8) from kmem_msb vmem arena
d504693c is d5046000+93c (vmem_seg d4eb99d8) from kmem_metadata vmem arena
d504693c is d5044000+293c (vmem_seg d4eb66f8) from heap vmem arena