kernel oops 分析

https://groups.google.com/group/linux.kernel/browse_thread/thread/b70bffe9015a8c41/ed9c0a0cfcd31111
from Linus

原文如下：
---

On Mon, 7 Jan 2008, Kevin Winchester wrote:
> J. Bruce Fields wrote:

> > Is there any good basic documentation on this to point people at?

> I would second this question. I see people "decode" oops on lkml often
> enough, but I've never been entirely sure how its done. Is it somewhere
> in Documentation?

It's actually not necessarily at all that trivial, unless you have a deep
understanding of the code generated for the architecture in question (and
even then, some oopses take more time to figure out than others, thanks
to inlining and tailcalls etc).

If the oops happened with a kernel you generated yourself, it's usually
rather easy. Especially if you said "y" to the "generate debugging info"
question at configuration time. Because, in that case, you really just do
a simple

gdb vmlinux

and then you can do (for example) something like setting a breakpoint at
the EIP that was reported for the oops, and it will tell you what line it
came from.

However, if you don't have the exact binary - which is the common case for
random oopses reported on lkml - you will generally have to disassemble
the hex sequence given in the oops (the "Code:" line), and try to match it
up against the source code to try to figure out what is going on.

Even just the disassembly is not entirely trivial, since the oops will
give you the eip that it happened at, but you often want to also
disassemble *backwards* in order to get more of a context (the "Code:"
line will mark the particular EIP that starts the oopsing instruction by
enclosing it in <xx>, but with non-constant instruction lengths, you need
to use a bit of trial-and-error to figure it out.

I usually just compile a small program like

const char array[]="/xnn/xnn/xnn...";

int main(int argc, char **argv)
{
printf("%p/n", array);
*(int *)0=0;
}

and run it under gdb, and then when it gets the SIGSEGV (due to the
obvious NULL pointer dereference), I can just ask gdb to disassemble
around the array that contains the code[] stuff. Try a few offsets, to see
when the disassembly makes sense (and gives the reported EIP as the
beginning of one of the disassembled instructions).

(You can do it other and smarter ways too, I'm not claiming that's a
particularly good way to do it, and the old "ksymoops" program used to do
a pretty good job of this, but I'm used to that particular idiotic way
myself, since it's how I've basically always done it)

After that, you still need to try to match up the assembly code with the
source code and figure out what variables the register contents actually
are all about. You can often try to do a

make the/affected/file.s

to generate the asm file in your own tree - the register allocation can be
totally different due to different compilers and different options (and
things like the fact that maybe the source tree you do this on doesn't
match the oops report exactly), but it's usually a good starting point to
compare the disassembly from gdb with the *.s file output from the
compiler.

Quite often, it's all very obvious (you see some constant or other simple
pattern). But if you're not used to the assembly format, you'll spend a
lot of brainpower just trying to figure that part out even for the obvious
stuff, which is why it's a good thing if you are very comfortable indeed
with the assembly language of that particular platform.

It's not really all that hard. But the first few times you see those
oopses, it all looks mostly like just line noise. So it definitely takes
some practice to do it well.

Anyway, let's take an example, from

http://lkml.org/lkml/2008/1/1/189

where the most obviously relevant parts are:

BUG: unable to handle kernel paging request at virtual address 00100100
EIP: 0060:[<f8819668>]
EIP is at evdev_disconnect+0x65/0x9e

eax: 00000000 ebx: 000ffcf0 ecx: c1926760 edx: 00000033
esi: f7415600 edi: f741564c ebp: f7415654 esp: c1967e68
Call Trace:
[<c03454b2>] input_unregister_device+0x6f/0xff
[<c03c6eb6>] klist_release+0x27/0x30
[<c029178a>] kref_put+0x5f/0x6c
..
Code: 5e 4c 81 eb 10 04 00 00 eb 21 8d 83 08 04 00 00 b9 06 00 02
00 ba 1d 00 00 00 e8 6a 93 95 c7 8b 9b 10 04 00 00 81 eb 10
04 00 00 <8b> 83 10 04 00 00 0f 18 00 90 8d 83 10 04 00 00
39 f8 75 cb 8d

so here let's do the above silly C program:

const char array[]="/x5e/x4c/x81/xeb/x10/x04/x00/x00/xeb/x21..

and running it under gdb gives:

0x8048500

Program received signal SIGSEGV, Segmentation fault.
0x080483f7 in main () at test.c:14
14 *(int*)0=0;

and now I can just try

x/20i 0x8048500

and it turns out that already gives a reasonable disassembly. The first
few instructions are bogus: they're really part of the previous
instruction, but it looks pretty sane around the actual problem spot,
which is "array+43" (there are 42 bytes of code before the EIP one, and 20
bytes after):

0x8048500 <array>: pop %esi
0x8048501 <array+1>: dec %esp
0x8048502 <array+2>: sub $0x410,%ebx
0x8048508 <array+8>: jmp 0x804852b <array+43>
0x804850a <array+10>: lea 0x408(%ebx),%eax
0x8048510 <array+16>: mov $0x20006,%ecx
0x8048515 <array+21>: mov $0x1d,%edx
0x804851a <array+26>: call 0xcf9a1889
0x804851f <array+31>: mov 0x410(%ebx),%ebx
0x8048525 <array+37>: sub $0x410,%ebx
0x804852b <array+43>: mov 0x410(%ebx),%eax
0x8048531 <array+49>: prefetchnta (%eax)
0x8048534 <array+52>: nop
0x8048535 <array+53>: lea 0x410(%ebx),%eax
0x804853b <array+59>: cmp %edi,%eax
0x804853d <array+61>: jne 0x804850a <array+10>
0x804853f <array+63>: lea (%eax),%eax
..

so now we know that the faulting instruction was that

mov 0x410(%ebx),%eax

and we can also see that this also matches the address that caused the
oops (ebx=000ffcf0, so 0x410(%ebx) is 00100100, which matches the "unable
to handle kernel paging request" message).

(Now, people used to kernel oopses will also recognize 00100100 as the
LIST_POISON1, so this is all about dereferencing the ->next pointer of a
list entry that has been removed from the list, but that's a whole
separate level of kernel knowledge).

Anyway, you can now do

make drivers/input/evdev.s

and see if you can find that kind of code sequence in there. You can use
the "EIP: evdev_disconnect+0x65/0x9e" thing as a hint: if your compiler
setup isn't too different, it's likely to be roughly two thirds into that
evdev_disconnect function (but inlining really can mean that it's
somewhere else entirely in the source tree!)

The rest left as an exercise for the reader.

Linus
--

例如這樣的一個Oops：
                Oops: 0000 [#1] PREEMPT SMP
                Modules linked in: capidrv kernelcapi isdn slhc ipv6 loop dm_multipath snd_ens1371 gameport snd_rawmidi snd_ac97_codec ac97_bus snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd parport_pc floppy parport pcnet32 soundcore mii pcspkr snd_page_alloc ac i2c_piix4 i2c_core button power_supply sr_mod sg cdrom ata_piix libata dm_snapshot dm_zero dm_mirror dm_mod BusLogic sd_mod scsi_mod
ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd

                Pid: 1726, comm: kstopmachine Not tainted (2.6.24-rc3-module #2)
                EIP: 0060:[<c04e53d6>] EFLAGS: 00010092 CPU: 0
                EIP is at list_del+0xa/0x61
                EAX: e0c3cc04 EBX: 00000020 ECX: 0000000e EDX: dec62000
                ESI: df6e8f08 EDI: 000006bf EBP: dec62fb4 ESP: dec62fa4
                 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
                Process kstopmachine (pid: 1726, ti=dec62000 task=df8d2d40 task.ti=dec62000)
                Stack: 000006bf dec62fb4 c04276c7 00000020 dec62fbc c044ab4c dec62fd0 c045336c
                       df6e8f08 c04532b4 00000000 dec62fe0 c043deb0
c043de75 00000000 00000000
                       c0405cdf df6e8eb4 00000000 00000000 00000000 00000000 00000000
                Call Trace:
                 [<c0406081>] show_trace_log_lvl+0x1a/0x2f
                 [<c0406131>] show_stack_log_lvl+0x9b/0xa3
                 [<c04061dc>] show_registers+0xa3/0x1df
                 [<c0406437>] die+0x11f/0x200
                 [<c0613cba>] do_page_fault+0x533/0x61a
                 [<c06123ea>] error_code+0x72/0x78
                 [<c044ab4c>] __unlink_module+0xb/0xf
                 [<c045336c>] do_stop+0xb8/0x108
                 [<c043deb0>] kthread+0x3b/0x63
                 [<c0405cdf>] kernel_thread_helper+0x7/0x10
                 =======================
                Code: 6b c0 e8 2e 7e f6 ff e8 d1 16 f2 ff b8 01 00 00 00 e8 aa 1c f4 ff 89 d8 83 c4 10 5b 5d c3 90 90 90 55 89 e5 53 83 ec 0c 8b 48 04 <8b> 11 39 c2 74 18 89 54 24 08 89 44 24 04 c7 04 24 be 32 6b c0
                EIP: [<c04e53d6>] list_del+0xa/0x61 SS:ESP 0068:dec62fa4
                note: kstopmachine[1726] exited with preempt_count 1

        1, 有自己編譯的vmlinux：使用gdb

           編譯時打開complie with debug info選項。

           注意這行：

                EIP is at list_del+0xa/0x61

           這告訴我們，list_del函數有0x61這麼大，而Oops發生在0xa處。那麼我們先看一下list_del從哪裏開始：

                # grep list_del /boot/System.map-2.6.24-rc3-module
                c10e5234 T plist_del
                c10e53cc T list_del
                c120feb6 T klist_del
                c12d6d34 r __ksymtab_list_del
                c12dadfc r __ksymtab_klist_del
                c12e1abd r __kstrtab_list_del
                c12e9d03 r __kstrtab_klist_del

           於是我們知道，發生Oops時的EIP值是：

                c10e53cc + 0xa == c10e53d6

           然後用gdb查看：

                # gdb /home/arc/build/linux-2.6/vmlinux
                (gdb) b *0xc10e53d6
                Breakpoint 1 at 0xc10e53d6: file /usr/src/linux-2.6.24-rc3/lib/list_debug.c, line 64.

           看，gdb直接就告訴你在哪個文件、哪一行了。

           gdb中還可以這樣：

                # gdb Sources/linux-2.6.24/vmlinux
                (gdb) l *do_fork+0x1f
                0xc102b7ac is in do_fork (kernel/fork.c:1385).
                1380
                1381    static int fork_traceflag(unsigned clone_flags)
                1382    {
                1383            if (clone_flags & CLONE_UNTRACED)
                1384                    return 0;
                1385            else if (clone_flags & CLONE_VFORK) {
                1386                    if (current->ptrace & PT_TRACE_VFORK)
                1387                            return PTRACE_EVENT_VFORK;
                1388            } else if ((clone_flags & CSIGNAL) != SIGCHLD) {
                1389                    if (current->ptrace & PT_TRACE_CLONE)
                (gdb)

            也可以直接知道line number。

            或者：

                (gdb) l *(0xffffffff8023eaf0 + 0xff) /* 出錯函數的地址加上偏移 */

        2, 沒有自己編譯的vmlinux： TIPS

           如果在lkml或bugzilla上看到一個Oops，而自己不能重現，那就只能反彙編以"Code:"開始的行。這樣可以嘗試定位到
           源代碼中。

           注意，Oops中的Code:行，會把導致Oops的第一條指令，也就是EIP的值的第一個字節，用尖括號<>括起來。但是，有些
           體系結構(例如常見的x86)指令是不等長的(不一樣的指令可能有不一樣的長度)，所以要不斷的嘗試(trial-and-error)。

           Linus通常使用一個小程序，類似這樣：

                const char array[] = "/xnn/xnn/xnn...";
                int main(int argc, char *argv[])
                {
                        printf("%p/n", array);
                        *(int *)0 = 0;
                }

e.g. /*{{{*/ /* 注意， array一共有從array[0]到array[64]這65個元素，其中出錯的那個操作碼<8b>
== arry[43] */
#include <stdio.h>
#include <stdlib.h>

const char array[]
="/x6b/xc0/xe8/x2e/x7e/xf6/xff/xe8/xd1/x16/xf2/xff/xb8/x01/x00/x00/x00/xe8/xaa/x1c/xf4/xff/x89/xd8/x83/xc4/x10/x5b/x5d/xc3/x90/x90/x90/x55/x89/xe5/x53/x83/xec/x0c/x8b/x48/x04/x8b/x11/x39/xc2/x74/x18/x89/x54/x24/x08/x89/x44/x24/x04/xc7/x04/x24/xbe/x32/x6b/xc0";
int main(int argc, char *argv[])
{
        printf("%p/n", array);
        *(int *)0 = 0;
}
/*}}}*/

           用gcc -g編譯，在gdb裏運行它：

                [arc@dhcp-cbjs05-218-251 ~]$ gdb hello
                GNU gdb Fedora (6.8-1.fc9)
                Copyright (C) 2008 Free Software Foundation, Inc.
                License GPLv3+: GNU GPL version 3 or later <[url]http://gnu.org/licenses/gpl.html[/url]>
                This is free software: you are free to change and redistribute it.
                There is NO WARRANTY, to the extent permitted by law.
Type "show copying"
                and "show warranty" for details.
                This GDB was configured as "x86_64-redhat-linux-gnu"...
                (no debugging symbols found)
                (gdb) r
                Starting program: /home/arc/hello
                0x80484e0

                Program received signal SIGSEGV, Segmentation fault.

           注意，這時候就可以反彙編0x80484e0這個地址：

                (gdb) disassemble 0x80484e0
                Dump of assembler code for function array:
                0x080484e0 <array+0>:   imul   $0xffffffe8,%eax,%eax
                0x080484e3 <array+3>:   jle,pn 0x80484dc <__dso_handle+20>
                0x080484e6 <array+6>:   ljmp   *<internal disassembler error>
                0x080484e8 <array+8>:   rcll   (%esi)
                0x080484ea <array+10>: repnz (bad)
                0x080484ec <array+12>: mov    $0x1,%eax
                0x080484f1 <array+17>: call   0x7f8a1a0
                0x080484f6 <array+22>: mov    %ebx,%eax
                0x080484f8 <array+24>: add    $0x10,%esp
                0x080484fb <array+27>: pop    %ebx
                0x080484fc <array+28>: pop    %ebp
                0x080484fd <array+29>: ret
                0x080484fe <array+30>: nop
                0x080484ff <array+31>: nop
                0x08048500 <array+32>: nop
                0x08048501 <array+33>: push   %ebp
                0x08048502 <array+34>: mov    %esp,%ebp
                0x08048504 <array+36>: push   %ebx
                0x08048505 <array+37>: sub    $0xc,%esp
                0x08048508 <array+40>: mov    0x4(%eax),%ecx
                0x0804850b <array+43>: mov    (%ecx),%edx
                0x0804850d <array+45>: cmp    %eax,%edx
                0x0804850f <array+47>: je     0x8048529
                0x08048511 <array+49>: mov    %edx,0x8(%esp)
                0x08048515 <array+53>: mov    %eax,0x4(%esp)
                0x08048519 <array+57>: movl   $0xc06b32be,(%esp)
                0x08048520 <array+64>: add    %ah,0xa70
                End of assembler dump.
                (gdb)

          OK, 現在你知道出錯的那條指令是array[43]，也就是mov
(%ecx),%edx，也就是說，(%ecx)指向了一個錯誤內存地址。

補充：

爲了使彙編代碼和C代碼更好的對應起來， Linux內核的Kbuild子系統提供了這樣一個功能：任何一個C文件都可以單獨編譯成彙編文件，例如：

make path/to/the/sourcefile.s

例如我想把kernel/sched.c編譯成彙編，那麼：

make kernel/sched.s V=1

或者：

make kernel/sched.lst V=1

另外，內核源代碼目錄的./scripts/decodecode文件是用來解碼Oops的：

./scripts/decodecode < Oops.txt

kernel oops 分析

[轉帖]使用NMT和pmap解決JVM資源泄漏問題原創

Python實現大麥網搶票的四大關鍵技術點解析

Python 安裝庫指令大全

salesforce零基礎學習（一百三十八）零碎知識點小總結（十）

一款開源的.NET程序集反編譯、編輯和調試神器

關於接口協議，你必須要知道這些！

2020年上半年數據庫系統工程師考試

基於 Milvus + LlamaIndex 實現高級 RAG

【2024-05-21】以茶會友

kernel oops 分析

內核中調用用戶程序的接口

關於linux bitops的使用

GCC內嵌彙編之語法詳解

關於有限狀態機的一種程序實現結構

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結