Chapter 8. Kernel Debugging with KDB
8.1. Introduction
Having the ability to examine kernel processes and memory in real-time is extremely important to a kernel developer. It isn’t quite as important to application developers or system administrators, but there are occasions when obtaining specific kernel data can be critical to resolving difficult problems. For this reason, this section offers an introduction to the basics of how to get KDB installed and running and the very basics of using it. It is an extremely powerful tool, and in-depth information on it is beyond the scope of this book. There are many resources available on the Internet and in print that can be referred to for further information.
能夠實時檢查內核進程和內存對內核開發人員來說非常重要。它對應用程序開發人員或系統管理員來說並不那麼重要, 但是獲取特定內核數據, 可能對解決難題至關重要。因此, 本節介紹瞭如何獲得 KDB 及其安裝和運行的基本知識,以及使用它的基本知識。這是一個非常強大的工具。對這個工具的深入研究, 超出了本書的範圍。互聯網上和印刷品上有許多資源可供參考,。
KDB is not the only kernel debugger available for Linux—KGDB is also commonly used. Each has its own strengths and weaknesses, and many people prefer one over the other. The main difference is that KGDB actually uses GDB to debug the kernel; however, it must be done from a remote computer and cannot be done on the live machine. KDB provides kernel debugging capabilities directly on the live kernel while it is running and can also be used remotely. Given that using KGDB is the same as GDB, one can learn how to use it by referring to Chapter 6, “The GNU Debugger.”
KDB 不是唯一可用於 Linux的內核調試器,KGDB也經常使用。每個工具都有自己的長處和弱點, 許多人更喜歡一個工具。主要的區別是 KGDB 實際使用 GDB 來調試內核; 但是, 它必須從遠程計算機完成, 並且不能在正常運行的計算機上完成。KDB 在內核運行時直接提供內核調試功能, 也可以遠程使用。鑑於KGDB 與 GDB 的使用方法相同, 可以通過第6章 "GNU 調試器" 來學習如何使用它。
8.2. Enabling KDB
KDB is not part of the mainstream kernel source code found at kernel.org. It is, however, included in the kernels that come with some of the major distributions. If you are not using a distribution kernel, the patch for KDB can be obtained from the KDB homepage at http://oss.sgi.com/projects/kdb/. Even though KDB is included in some distributions, it is very likely not enabled by default and requires a kernel rebuild. It is quite easy to enable—simply set the CONFIG_KDB configuration option manually or through your favorite make config interface and rebuild the kernel. Figure 8.1 shows the option highlighted and enabled in the “Kernel hacking” menu when using the make menuconfig interface.
KDB 不是 kernel.org 中的主流內核源代碼的一部分。但是, 它包含在一些主要linux發行版的內核中。如果您沒有使用linux發行版的內核, 則可以從 http://oss.sgi.com/projects/kdb/的 KDB 主頁獲取 KDB 的修補。儘管 KDB 包含在某些發行版中, 但默認情況下很可能沒有啓用, 需要重新編譯內核。啓動KDB很容易-簡單地手動設置 CONFIG_KDB 配置選項或通過make config,然後重編內核。圖8.1 顯示了使用 menuconfig 時在 "kernel hacking" 菜單中突出顯示和啓用的選項。
Another option of interest is KDB off by default. The help text for this option documents what it does:
默認情況下, 另一個選項CONFIG_KDB_OFF。此選項的幫助文本記錄了它的功能:
CONFIG_KDB_OFF:
Normally kdb is activated by default, as long as CONFIG_KDB is set. If you want to ship a kernel with kdb support but only have kdb turned on when the user requests it then select this option. When compiled with CONFIG_KDB_OFF, kdb ignores all events unless you boot with kdb=on or you echo “1” > /proc/sys/kernel/kdb. This option also works in reverse;, if kdb is normally activated, you can boot with kdb=off or echo “0” > /proc/sys/kernel/kdb to deactivate kdb. If unsure, say N.
If you want to use KDB to debug a real problem or just experiment with it, enable CONFIG_KDB_OFF. This way, KDB will only be turned on when you explicitly want it.
如果要使用 KDB 來調試真正的問題, 或者只是實驗, 請啓用 CONFIG_KDB_OFF。這樣, KDB 只在您明確需要時纔會打開。
8.3. Using KDB
The following sections give a very high-level overview of how to use KDB once it is installed on your system.
以下各節對安裝 KDB 後,如何使用它提供了概述。
8.3.1. Activating KDB
If you use the X Windows system, be sure to first switch to a text virtual console by using the CTRL-ALT-Fn key sequence. Once at a text virtual console, log in as root and enable KDB by executing the following command:
如果使用 X Windows 系統, 請確保首先使用 CTRL ALT Fn 鍵切換到文本虛擬控制檯。在文本虛擬控制檯中, 以 root 用戶身份登錄, 並執行以下命令啓用 KDB:
echo 1 > /proc/sys/kernel/kdb
You are now ready to enter kernel debugging mode by pressing the hotkey which is the Pause/Break key.
現在, 您可以通過按暫停/中斷鍵的鍵來進入內核調試模式。
Note: When KDB is enabled it will automatically be invoked during a system panic. 注意: 啓用 KDB 後, 系統死機時會自動調用它。 |
When you press the hotkey, you will see the following:
penguin:/proc/sys/kernel #
Entering kdb (current=0xc03dc000, pid 0) due to Keyboard Entry
Note: When you are in KDB you may notice that your caps and scroll lock LEDs flash rapidly. This is normal. 注意: 當您在 KDB 時, 您可能會注意到您的caps和scroll鎖定指示燈快速閃爍。這是正常的。 |
8.3.2. Resuming Normal Execution
Because being in KDB is such a sensitive mode, let’s immediately document how to get out of KDB and return your computer back to normal operation. Use the go command for this. You may need to press the enter key a few times to get your shell prompt back.
因爲在 KDB 是一個敏感的模式, 讓我們立即記錄如何退出 KDB, 並使您的計算機恢復正常運行。使用 "go" 命令進行此項。您可能需要按 enter 鍵幾次才能使 shell 提示返回。
8.3.3. Basic Commands
The first thing to familiarize yourself with in KDB is the Help screen. You can display this by entering either the ? or help commands. The contents of the Help display are shown here:
在 KDB 中,是自己熟悉KDB的第一件事就是幫助屏幕。可以通過輸入 "?" 或幫助命令來顯示此項,內容如下所示:
Code View: Scroll / Show All
kdb> ?
Command Usage Description
______________________________________________________________
md <vaddr> Display Memory Contents,
also mdWcN, e.g. md8c1
mdr <vaddr> <bytes> Display Raw Memory
mds <vaddr> Display Memory Symbolically
mm <vaddr> <contents> Modify Memory Contents
id <vaddr> Display Instructions
go [<vaddr>] Continue Execution
rd Display Registers
rm <reg> <contents> Modify Registers
ef <vaddr> Display exception frame
bt [<vaddr>] Stack traceback
btp <pid> Display stack for process <pid>
bta Display stack all processes
ll <first-element> <lin Execute cmd for each element in linked list
env Show environment variables
set Set environment variables
help Display Help Message
? Display Help Message
cpu <cpunum> Switch to new cpu
ps Display active task list
reboot Reboot the machine immediately
sections List kernel and module sections
lsmod List loaded kernel modules
rmmod <modname> Remove a kernel module
sr <key> Magic SysRq key
dmesg Display syslog buffer
bp [<vaddr>] Set/Display breakpoints
bl [<vaddr>] Display breakpoints
bpa [<vaddr>] Set/Display global breakpoints
bph [<vaddr>] Set hardware breakpoint
bpha [<vaddr>] Set global hardware breakpoint
bc <bpnum> Clear Breakpoint
be <bpnum> Enable Breakpoint
bd <bpnum> Disable Breakpoint
ss Single Step
ssb Single step to branch/call
KDB has a large command set and is capable of doing a great deal of valuable debugging operations. The intent here is not to go into detail on what everything does; rather the intent is to help you get KDB running and to give a quick overview on some of the more straightforward commands.
KDB 有一個大的命令集, 並且能夠做大量有價值的調試操作。這裏的目的不是詳細介紹一切內容;相反, 目的是幫助您運行 KDB和快速概述一些更直接的命令。
The ps command is similar to the user-land command; however, in KDB it displays the task list from the kernel’s viewpoint rather than the process listing from a user’s viewpoint.
ps 命令類似於用戶命令;但是, 在 KDB 中, 它從內核的角度顯示任務列表, 而不是從用戶的角度來列出進程列表。
Code View: Scroll / Show All
kdb> ps
Task Addr Pid Parent [*] cpu State Thread Command
0xcdfa2000 00000001 00000000 0 000 stop 0xcdfa2280 init
0xc1c14000 00000002 00000001 0 000 stop 0xc1c14280 keventd
0xc1c10000 00000003 00000001 0 000 stop 0xc1c10280 kapmd
0xcdffe000 00000004 00000001 0 000 stop 0xcdffe280 ksoftirqd_CPU0
0xcdffc000 00000005 00000001 0 000 stop 0xcdffc280 kswapd
0xcdf66000 00000006 00000001 0 000 stop 0xcdf66280 bdflush
0xcdf64000 00000007 00000001 0 000 stop 0xcdf64280 kupdated
0xcdf60000 00000008 00000001 0 000 stop 0xcdf60280 kinoded
0xcdf0a000 00000009 00000001 0 000 stop 0xcdf0a280 mdrecoveryd
0xc2284000 00000012 00000001 0 000 stop 0xc2284280 kreiserfsd
0xc2ae0000 00000869 00000001 0 000 stop 0xc2ae0280 dhcpcd
0xc2a70000 00001040 00000001 0 000 stop 0xc2a70280 syslogd
0xc2a6c000 00001043 00000001 0 000 stop 0xc2a6c280 klogd
0xc2992000 00001102 00000001 0 000 stop 0xc2992280 khubd
0xc2ffc000 00001268 00000001 0 000 stop 0xc2ffc280 resmgrd
0xc2dea000 00001335 00000001 0 000 stop 0xc2dea280 cardmgr
0xc2d7c000 00001435 00000001 0 000 stop 0xc2d7c280 portmap
0xc2d0c000 00001466 00000001 0 000 stop 0xc2d0c280 vmnet-bridge
0xc2cd8000 00001489 00000001 0 000 stop 0xc2cd8280 vmnet-natd
0xc33e6000 00001829 00000001 0 000 stop 0xc33e6280 smpppd
0xc2c9e000 00001831 00000001 0 000 stop 0xc2c9e280 sshd
I’ve truncated the output, as it will display all kernel and user tasks running on the system.
我截斷了輸出, 因爲它將顯示系統上運行的所有內核和用戶任務。
Notice how all Task Addr and Thread addresses are above the 0xc0000000 memory location. As was discussed in the “/proc/<pid>/maps” section of Chapter 3, “The /proc Filesystem,” in a 3:1 split address space, the kernel resides at 0xc0000000. This proves to us that we are in fact seeing pointers to kernel data structures.
注意所有任務地址和線程地址在0xc0000000 內存位置的上方。正如第3章 "/proc文件系統" 的 "/proc/<pid>/maps" 一節中討論的那樣, 在3:1 分割地址空間, 內核駐留在0xc0000000 中。我們實際上看到的內核數據結構的指針證明了這些.
Let’s have a closer look at the main process of a Linux system - init. First, let’s take a look at its stack (the format of the output is slightly modified for easier reading):
讓我們仔細看看 Linux 系統初始化的主要過程。首先, 讓我們來看看它的棧 (輸出的格式略有修改, 以便於閱讀):
Code View: Scroll / Show All
kdb> btp 1
0xcdfa2000 00000001 00000000 0 000 stop 0xcdfa2280 init
ESP EIP Function (args)
0xcdfa3ecc 0xc0119e62 do_schedule+0x192 (0xc0421bbc, 0xc0421bbc,0x8f5908, 0xcdfa2000, 0xc01258c0) kernel .text 0xc0100000 0xc0119cd0
0xc011a040
0xcdfa3ef0 0xc0125933 schedule_timeout+0x63 (0x104, 0xcdfa2000, 0x1388,
0x0, 0x0)
kernel .text 0xc0100000 0xc01258d0
0xc0125980
0xcdfa3f1c 0xc01548e1 do_select+0x1e1 (0xb, 0xcdfa3f90, 0xcdfa3f8c,0x1,
0x4)
kernel .text 0xc0100000 0xc0154700
0xc0154930
0xcdfa3f58 0xc0154c88 sys_select+0x328 (0xb, 0xbffff720, 0x0, 0x0,
0xbffff658)
kernel .text 0xc0100000 0xc0154960
0xc0154e30
0xcdfa3fc4 0xc0108dd3 system_call+0x33
kernel .text 0xc0100000 0xc0108da0
0xc0108de0
All of the functions in the stack traceback are kernel functions. Without knowing a great deal about kernel programming, we can see that at the time KDB was invoked, init was in the middle of processing a select() system call from user-land. This conclusion can be made because of the do_select stack frame. Most system calls have a kernel worker routine named after the system call with do_ prefixed to it. The schedule_timeout and do_schedule stack frames are functions that do_select called to have this process wait for communication to occur.
棧回溯中的所有函數都是內核函數。在不瞭解內核編程的情況下, 我們可以看到, 在調用 KDB 時, init 正在處理來自用戶空間的 select () 系統調用。這一結論可以通過 do_select 棧幀證明。大多數系統調用都有一個do_ 爲前綴的內核工作線程。schedule_timeout 和 do_schedule 棧幀是 do_select 調用, 以使此進程等待通信發生的函數。
Let’s now see what init will do when KDB is resumed. To do this, we can examine the assembly instructions starting at init’s current eip of 0xc0119e62:
現在讓我們看看在恢復 KDB 時, init會做什麼。爲此, 我們可以從init的0xc0119e62 的 eip 開始檢查彙編指令:
kdb> id 0xc0119e62
0xc0119e62 do_schedule+0x192: pop %ebp
0xc0119e63 do_schedule+0x193: pop %edi
0xc0119e64 do_schedule+0x194: pop %esi
0xc0119e65 do_schedule+0x195: push %esi
0xc0119e66 do_schedule+0x196: call 0xc0119790 schedule_tail
0xc0119e6b do_schedule+0x19b: pop %ebx
0xc0119e6c do_schedule+0x19c: mov $0xffffe000,%eax
0xc0119e71 do_schedule+0x1a1: and %esp,%eax
0xc0119e73 do_schedule+0x1a3: mov 0x14(%eax),%eax
0xc0119e76 do_schedule+0x1a6: test %eax,%eax
0xc0119e78 do_schedule+0x1a8: je 0xc0119fe7 do_schedule+0x317
0xc0119e7e do_schedule+0x1ae: mov 0xc0422324,%eax
0xc0119e83 do_schedule+0x1b3: mov $0xffffe000,%esi
0xc0119e88 do_schedule+0x1b8: and %esp,%esi
0xc0119e8a do_schedule+0x1ba: mov %eax,0x3c(%esi)
0xc0119e8d do_schedule+0x1bd: cli
kdb> go
The most interesting thing here is that there will be a call to schedule_tail very shortly. Looking directly at do_schedule in the kernel source, there is only one call to schedule_tail in it, so we can make a very good guess which source code init is executing. The do_schedule function is rather large, so only a snippet of it is shown with the call to schedule_tail bolded:
這裏最有趣的事情是, 很快就會有一個 schedule_tail 的調用。直接查看內核源中的 do_schedule, 只有一個函數調用 schedule_tail, 所以我們可以猜測init哪部分源代碼正在執行。do_schedule 函數是相當大的, 所以只顯示與調用 schedule_tail相關的片段(粗體):
switch_tasks:
prefetch(next);
rq->quiescent++;
clear_tsk_need_resched(prev);
if (likely(prev != next)) {
rq->nr_switches++;
rq->curr = next;
prepare_arch_switch(rq, next);
prev = context_switch(rq, prev, next);
barrier();
/* from this point "rq" is invalid in the stack */
schedule_tail(prev);
} else
spin_unlock_irq(&rq->lock);
So from some quick commands in KDB, we can very quickly see exactly what a process is doing in the kernel. We can also make a good guess as to what the user-land process was doing as well.
Let’s take a quick look at one more process that is more familiar to every user—a bash process. The following is a stack traceback listing of a running bash shell. The format of the output is again modified for easier readability:
因此, 從 KDB 的一些快速命令中, 我們可以很快地看到一個進程在內核中所做的事情。我們也可以猜測用戶空間進程在做什麼。讓我們快速查看一下每個用戶都熟悉的進程--bash 進程。下面是一個正在運行的 bash shell 的棧回溯列表。再次修改輸出的格式以便於可讀:
Code View: Scroll / Show All
kdb> btp 2571
0xc5758000 00002571 00002419 0 000 stop 0xc5758280 bash
ESP EIP Function (args)
0xc5759eac 0xc0119e62 do_schedule+0x192
kernel .text 0xc0100000 0xc0119cd0
0xc011a040
0xc5759ed0 0xc012597a schedule_timeout+0xaa (0xc2c19800, 0x246, 0x0,
0x286, 0x0)
kernel .text 0xc0100000 0xc01258d0
0xc0125980
0xc5759ed8 0xc01cbbb1 set_cursor+0x61 (0xc2791000, 0xc3167ba0,
0xbffff46b, 0x1, 0xc3167ba0)
kernel .text 0xc0100000 0xc01cbb50
0xc01cbbd0
0xc5759f84 0xc01bea4f tty_read+0x8f (0xc3167ba0, 0xbffff46b, 0x1,
0xc3167bc0, 0xc5758000)
kernel .text 0xc0100000 0xc01be9c0
0xc01bea70
0xc5759fa0 0xc0144b88 sys_read+0x88 (0x0, 0xbffff46b, 0x1,
0x401d2c20, 0xbffff46b)
kernel .text 0xc0100000 0xc0144b00
0xc0144c00
0xc5759fc4 0xc0108dd3 system_call+0x33
kernel .text 0xc0100000 0xc0108da0
0xc0108de0
From this we can tell that the shell was executing a read system call, which makes sense because the bash shell was simply sitting at a prompt when KDB was entered. This conclusion is drawn by seeing sys_read, which was called by system_call in the stack traceback.
由此我們可以看出 shell 正在執行一個read系統調用, 這很有意義, 因爲在輸入 KDB 時, bash shell 只是等在提示符上。這一結論是通過看到 sys_read 在棧回溯中被 system_call 調用而得出的。
Occasionally, user-land debugging and diagnostic tools are unable to provide sufficient information about a process that is experiencing difficulty. For example, sometimes attaching strace to a process that appears to be hung may not reveal anything. This can indicate that the process is stuck in the kernel, perhaps waiting for a resource or another process. This is where observing the stacktrace and assembly instructions of the process in KDB can be most useful.
有時, 用戶地調試和診斷工具無法提供有關遇到困難的進程的足夠信息。例如, 有時將 strace 附加到看似掛起的進程,但改進程可能不會顯示任何內容。這可能表明進程已在內核中掛起, 可能正在等待資源或其他進程。這就是在 KDB 中觀察進程的棧跟蹤和彙編指令最有用的地方。
Another task commonly performed in debuggers is to display the current contents of the registers. In KDB, this is done with the rd command as shown here:
在調試器中通常執行的另一個任務是顯示寄存器的當前內容。在 KDB 中, 這是通過 rd 命令完成的, 如下所示:
kdb> rd
eax = 0x089b1430 ebx = 0x41513020 ecx = 0x41513860 edx = 0x41513758
esi = 0x0000000e edi = 0x00000011 esp = 0xbfffef40 eip = 0x085bdb3d
ebp = 0xbfffef88 xss = 0x0000002b xcs = 0x00000023 eflags = 0x00000202
xds = 0x0000002b xes = 0x0000002b origeax = 0xffffff01 ®s = 0xc346ffc4
These values are meaningful only to the process the kernel is currently executing. One interesting observation is that the instruction pointer, eip, contains a value that is relatively close to executables’ base address of 0x08048000. This generally means that code from an executable rather than a library is currently being run.
這些值僅對內核當前正在執行的進程有意義。一個有趣的現象是, 指令指針, eip, 包含一個相對接近可執行文件的基地址0x08048000 的值。這通常意味着當前正在運行的代碼來自可執行文件而不是庫。
8.4. Conclusion
KDB is not for the novice user—nor, however, is it exclusively for kernel developers and kernel experts. As demonstrated in this chapter, even without an in-depth understanding of the kernel, KDB can be useful in determining system problems. Great care, though, must be exercised when using it.
KDB 不是爲新手用戶提供的, 也不是專門爲內核開發人員和內核專家提供的。如本章所示, 即使沒有深入瞭解內核, KDB 也可以在確定系統問題時有用。但是, 在使用它時, 必須非常小心。