Chapter 7. Linux System Crashes and Hangs

Chapter 7. Linux System Crashes and Hangs

7.1. Introduction

One of Linux’s claims to fame is stability and infrequency of system crashes and hangs. Development versions of the kernel are less stable, but unfortunately, mainstream kernel versions will also sometimes crash or hang. The beauty of Linux is that when this happens, users have the ability to track the problem down to a failing line of source code and even fix it themselves if they’re so inclined! With proprietary operating systems, your only course of action is to contact the company or author(s) of the operating system and hope that they can help you. Anyone that has had this happen to them in the past knows that this can be the start of a very lengthy battle full of frustration, which is still never guaranteed to end happily with a solution to the problem. At least with Linux, with a full set of debugging and diagnostic tools and some knowledge about where to look, one is much better armed and ready to seek and find a solution. The goal of this section is to discuss the many tools that Linux provides to get you well on your way to analyzing some of the most crucial operating system problems. We will discuss how to set up, configure, and use a serial console; how to read and understand a kernel Oops report; and how to determine the failing line of source code from it.

Linux 名聲之一是系統穩定性和較低的崩潰和掛起的頻率。linux內核的開發版本不那麼穩定, 但不幸的是, 主流內核版本有時也會崩潰或掛起。Linux 的優點在於, 當這種情況發生時, 用戶有能力將問題跟蹤到源代碼的行中, 甚至可以自行修復, 如果有這種需求的話!有了商業操作系統, 你唯一的行動是聯繫操作系統的公司, 並希望他們能幫助你。曾經發生過這種事情的人都知道, 這可能是一場充滿挫敗感的漫長戰役的開始, 但這仍然永遠無法保證解決這個問題。至少對於 Linux 來說, 用戶擁有一整套的調試和診斷工具, 以及一些有關的知識, 他可以自己尋找解決方案。本節的目標是討論 Linux 提供的工具, 使您能夠分析一些最關鍵的操作系統問題。我們將討論如何配置和使用串行控制檯;如何閱讀和理解內核的Oops報告;以及如何確定源代碼的失敗行。

7.2. Gathering Information

Gathering information for analysis either by you or by a support group of kernel developers on the Internet is the first step on the road to troubleshooting a serious system problem. There are really two main types of serious system problems—a crash and a hang. A crash occurs when the kernel is aware of the problem it has just encountered and is able to do something about it before putting itself to sleep or rebooting. A hang occurs when there is a serious deadlock within the kernel that happens without warning and does not give the kernel the ability to do anything about it. Much of the tools for tracking the cause of each type of problem are the same; however, with a hang some of the diagnostic information may not be available, as the kernel didn’t have a chance to write it to disk or the screen.

收集分析信息是解決嚴重系統問題的第一步。嚴重的系統問題確實有兩種主要類型, 即崩潰和掛起。當內核意識到它剛剛遇到的問題時, 並且在將自己置於休眠或重新啓動之前能夠對其進行一些操作，就會發生崩潰。當內核中存在嚴重的死鎖而沒有警告，並且不讓內核做其它事情時, 就會發生掛起。調查問題的工具是相同的;但是, 當內核掛起時，一些診斷信息可能不可用, 因爲內核沒有機會將其寫入磁盤或屏幕。

7.2.1. Syslog Explained

The syslog is usually /var/log/messages but can be anywhere by modifying values in /etc/syslog.conf. The syslog file is a text log of messages written by the syslog daemon, which reads the messages directly from kernel buffers. Monitoring this file regularly can often provide crucial hints about the general health of your system such as disk space running out, memory being exhausted, I/O errors, device failures, and so on. When restarting the system after a crash or hang, this file should be examined first to see if anything was logged that could give a hint as to what might have caused the problem.

日誌記錄通常是/var/log/messages, 可以修改/etc/ syslog.conf,把日誌放到系統的任何地方。syslog 文件是由 syslog 守護進程的消息文本日誌, 它直接從內核緩衝區讀取消息。定期監視此文件通常可以發現系統的一些問題, 如磁盤空間耗盡、內存耗盡、I/O錯誤、設備故障等。在崩潰或掛起後重新啓動系統時, 應首先檢查此文件, 以查看是否記錄了可能導致問題的提示。

When doing this, the recommended procedure is the following:

1.	Wait for the system to fully restart.
2.	Log in or su (switch user) to root.
3.	Examine /etc/syslog.conf to determine the system log filename. For example, look for something like the following: # # save the rest in one file # .;mail.none;news.none -/var/log/messages
4.	Open /var/log/messages (or similar from Step 3) in vi, less, or your favorite editor.
5.	Navigate to the end of the file.
6.	Search backward for “restart.” You should see a line like this: Mar 14 19:45:21 linux syslogd 1.4.1: restart.
7.	The messages immediately prior to the line found in Step 5 are the last messages logged before the system restarted. Examine them for anything suspicious.

Generally, individual messages are composed of the following sequence of tokens:

An example showing a message coming from a kernel driver:

Mar 10 22:49:05 linux kernel: usb.c: deregistering driver serial

<timestamp> = Mar 10 22:49:05

<hostname> = linux

<message origin> = kernel: usb.c

<message text> = deregistering driver serial

We know from the message origin that the message came from the kernel, and we are also given the exact source file containing the message. We can now easily examine usb.c and search for the message text to see exactly where kernel execution was. The code in question looks like this:

我們從消息來源得知消息來自內核, 我們也得到了包含消息的確切源文件。現在, 我們可以檢查 usb. c 並搜索消息文本, 以查看內核執行的確切位置。問題代碼如下所示:

/**

* usb_deregister - unregister a USB driver

* @driver: USB operations of the driver to unregister

* Unlinks the specified driver from the internal USB driver list.

void usb_deregister(struct usb_driver *driver)

{

struct list_head *tmp;

info("deregistering driver %s", driver->name);

if (driver->fops != NULL)

usb_minors[driver->minor/16] = NULL;

Knowing exactly what this code does is not important—it is important at this time only to know that by a simple log message we can determine exactly what was being executed, which can be invaluable in problem determination. One thing to note, though, is that info is a macro defined as:

確切地知道這段代碼所做的並不重要-在這個時候, 重要的是要知道, 通過簡單的日誌消息, 我們可以準確地確定問題發生時正在執行什麼, 這在確定問題上是無價的。不過, 有一點要注意的是, 信息是一個宏, 定義爲:

Code View: Scroll / Show All

#define info(format, arg...) printk(KERN_INFO __FILE__ ": " format "\n" , ## arg)

As you can see, printk is the key function that performs the act of writing the message to the kernel message buffer. We can also see that the standard C macro __FILE__ is used to dump the source filename.

正如您所看到的, printk 是執行將消息寫入內核消息緩衝區的關鍵函數。我們還可以看到, 標準的 C 宏 __FILE__ 用於轉儲源文件名。

There is, however, no guarantee that anything will appear in the syslog. Often, even in the case of a crash, the klogd system logger is unable to write the information to disk. This is where a serial console becomes important. When properly configured, the kernel will send important log messages to the serial console as well as to the buffers where the syslog daemon picks them up. After the messages are sent over the serial line, the remote console will receive the messages and preserve them there.

但是, 不會保證出錯信息會出現在日誌中。通常, 在崩潰的情況下, klogd 系統記錄器無法將信息寫入磁盤。這是一個串行控制檯變得重要的地方。正確配置後, 內核將向串行控制檯以及日誌守護進程獲取數據的緩衝區發送重要的日誌消息。在通過串行線發送消息後, 遠程控制檯將接收消息並將它們保存在那裏。

7.2.2. Setting up a Serial Console

Possibly one of the most important tools in diagnosing and determining the cause of system crashes or hangs is the use of a serial console to gather diagnostic information. A serial console is not something that every system requires, although it doesn’t hurt to have. But if your system inexplicably crashes or hangs more than a few times in a very short period of time, it is highly recommended.

使用串行控制檯收集診斷信息可能是診斷和確定系統崩潰或掛起原因的最重要工具之一。串行控制檯不是每個系統都需要的東西, 儘管它沒有壞處。但是, 如果您的系統在很短的時間內莫名其妙地崩潰或掛起幾次, 強烈建議您這樣做。

Note: In the kernel source package included with most distributions, the file Documentation/serial-console.txt is an excellent guide on setting up and configuring a serial console.

注意: 在大多數發行版的內核源包中, 文件Documentation/serial-console.txt是設置和配置串行控制檯的優秀指南。

In conjunction with using the serial console is to enable the kernel magic SysRq key; refer to the sysrq section of Chapter 3, “The /proc Filesystem,” for more information. A serial console helps because often when a system enters a panic state, for example in the case of a kernel oops, the kernel will dump information to the kernel log daemon. This normally means that the information gets written to /var/log/messages; however, there are cases where the system is unable to perform the writes to disk, so this is where the serial console proves most useful. When properly set up, the information is dumped over the serial port as well so the remote system, which is in a healthy state, will receive and save this information. This information can then be analyzed using the techniques discussed in this section or forwarded to the appropriate support group.

與串行控制檯結合使用的是內核SysRq 鍵; 有關詳細信息, 請參閱第3章 "/proc文件系統" 的 sysrq 部分。因爲通常當系統進入panic狀態, 例如在內核的oops情況下, 內核會轉儲信息到內核日誌守護程序。這通常意味着信息被寫入/var/log/messages;但是, 有些情況下, 系統無法對磁盤執行寫入操作, 所以這是串行控制檯證明最有用的地方。在正確設置後, 信息將被轉儲到串行端口上, 因此當遠程系統處於健康狀態時, 將接收並保存此信息。然後, 可以使用本節中討論的技術或轉發到相應的技術支持組來分析此信息。

7.2.3. Connecting the Serial Null-Modem Cable

The first thing to do is obtain a serial null-modem cable. These can commonly be found at any computer store and generally sell for a minimal amount. You should also check the external serial ports on both computers to determine whether you require 9 or 25 pin connectors. Newer null-modem cables are sold with both 9 and 25 pin connectors on each end, so it may be desirable to purchase this kind.

首先要做的是獲取串行電纜。這些通常可以在任何計算機商店找到, 價格一般很便宜。還應檢查兩臺計算機上的外部串行端口, 以確定是否需要9或25針連接器。新的串行電纜每端都有9和25針連接器, 因此購買這種電源可能是最方便的。

Once the cable is in place, it should be tested to ensure that data can be sent from one machine to the other. Do this by first starting a communications program on the serial console of a separate system. If the machine is running Linux, minicom is a good choice, and if Windows is running, HyperTerminal is also fine.

一旦電纜到位, 應進行測試, 以確保數據可以從一臺機器發送到另一個。首先在獨立系統的串行控制檯上啓動一個通信程序。如果機器運行 Linux, minicom 是一個不錯的選擇, 如果 Windows 運行, HyperTerminal也很好。

Note: minicom may not be installed on your Linux system by default. The executable is usually /usr/bin/minicom. If you do not have it installed, most distributions include it as an optionally installed package that can be installed at any time.

注意: 默認情況下, minicom 可能沒有安裝在 Linux 系統上。可執行文件通常是/usr/bin/minicom。如果您沒有安裝它, 則大多數發行版都將其作爲可以隨時安裝的可選安裝的軟件包。

Generally, the default communications settings will suffice. Next, on the source machine run the following as root assuming the null-modem cable is connected to the first serial port on the computer (/dev/ttyS0):

通常, 默認的通信設置就足夠了。接下來, 在源計算機上, 假設串行電纜連接到計算機上的第一個串行端口 (/dev/ttyS0), 則將以下內容作爲 root 運行:

"stty speed 38400 < /dev/ttyS0 ; echo 'This should appear on the remote

machine' >/dev/ttyS0"

The message, “This should appear on the remote machine,” should appear in the communications program on the serial console. If it does not, some things to check for are

"這應該出現在遠程計算機上" 的消息應該出現在串行控制檯上的通信程序中。如果沒有, 有些事情要檢查

The cable is in fact a null-modem cable.
The cable is connected to the first serial port on the server; if it isn’t, change /dev/ttyS0 to /dev/ttyS1 and try again.
The serial console communication program is listening on the correct serial port.
The speed is set to 38400 on the serial console in the communications program.

7.2.4. Enabling the Serial Console at Startup

When you’ve verified that the serial console works, the next step is to configure Linux to send important messages over the serial connection. This is generally done by booting with an additional kernel boot parameter. The boot parameter can be typed in manually at boot time with most boot loaders, or it can be added permanently to the boot loader’s configuration file. The additional parameter should look like this:

驗證串行控制檯工作正常後, 下一步是配置 Linux 以通過串行連接發送重要消息。這通常是通過使用附加的內核啓動參數進行引導來完成的。啓動參數可以在引導時手動鍵入, 大多數引導加載程序, 也可以將其永久添加到引導加載器的配置文件中。附加參數應如下所列:

console=ttyS0,38400

Note that this shouldn’t replace any existing console= parameter but should be inserted before them instead. It is important to maintain any existing console=tty parameters so as not to render the virtual consoles unusable. For my system, I use GRUB, and here’s my menu.lst entry to enable the serial console:

請注意, 這不應替換任何現有的控制檯 = 參數, 而應在它們之前插入。必須維護任何現有的控制檯 = tty 參數, 以免使虛擬控制檯無法使用。對於我的系統, 我使用 GRUB, 這是我的菜單. 啓用串行控制檯的入門項:

title Linux

kernel (hd0,7)/boot/vmlinuz-2.4.21-99-default root=/dev/hda8 vga=0x314

splash=silent desktop hdc=ide-scsi hdclun=0 showopts console=/dev/ttyS0

console=/dev/tty0

initrd (hd0,7)/boot/initrd-2.4.21-99-default

After rebooting with this entry, the serial console should be set up, and some of the boot messages should appear. Note that not all boot messages appear on the serial console. When the server is booted up, be sure to enable the logging or capture feature in the communications program on the serial console to save all messages sent to it. For your reference, here’s an example of the kind of log captured in a serial console that would be sent to a distribution’s support team or a kernel developer (in this particular case, VMWare Inc. may also need to be contacted because the process name is vmware-vmx, but note that this does not in any way mean that there is a problem with this program):

使用此項重新啓動後, 應在串行控制檯顯示某些引導消息。請注意, 並非所有啓動消息都出現在串行控制檯上。啓動服務器時, 請確保在串行控制檯上的通信程序中啓用日誌記錄或捕獲功能, 以保存發送給它的所有信息。下面是一個在串行控制檯中捕獲的日誌類型的示例, 供你參考, 它將被髮送到linux發行版的技術支持團隊或內核開發人員 (在特殊情況下,可能還需要聯繫vmware 公司, 因爲進程名爲 vmware-vmx, 但請注意, 這並不意味着該程序存在問題):

Code View: Scroll / Show All

Unable to handle kernel NULL pointer dereference at virtual address 000005e8

printing eip: c429aa52

*pde = 00000000

Oops: 0002 2.4.21-99-default #1 Wed Sep 24 13:30:51 UTC 2003

CPU: 0

EIP: 0010:[usb-uhci:uhci_device_operations+31708122/24331932]

Tainted: PF

EIP: 0010:[<c429aa52>] Tainted: PF

EFLAGS: 00213246

eax: 00000000 ebx: 00000001 ecx: c36a4720 edx: 00000001

esi: 00000000 edi: 00000000 ebp: c7f7fe68 esp: c7f7fe50

ds: 0018 es: 0018 ss: 0018

Process vmware-vmx (pid: 2808, stackpage=c7f7f000)

Stack: 00000000 c7f7fec8 42826000 c0ed8860 c8dc7520 ffffffea c7f7ff88 c429886c

cdf9ce00 00000000 00000000 00000000 c036a2e0 c0121ce2 c0121bc9 00000000

00000001 c0121992 00003046 00003046 00000001 00000000 c02e0054 c7f7fec8

Call Trace: [usb-uhci:uhci_device_operations+31699444/24340610] [bh_action+66

/80] [tasklet_hi_action+57/112] [do_softirq+98/224] [do_IRQ+156/176]

Call Trace: [<c429886c>] [<c0121ce2>] [<c0121bc9>] [<c0121992>] [<c010a1dc>]

[call_do_IRQ+5/13] [__do_mmap_pgoff+1361/1632] [__do_mmap_pgoff+1419/1632] [__

do_mmap2+88/176] [__do_mmap2+119/176] [sys_ioctl+470/618]

[<c010c4d8>] [<c0131ab1>] [<c0131aeb>] [<c010e8d8>] [<c010e8f7>] [<c0153526>]

[sys_mmap2+35/48] [system_call+51/64]

[<c010e993>] [<c0108dd3>]

Modules: [(vmmon:<c4298060>:<c429defc>)]

Code: 89 9e e8 05 00 00 50 50 8b 45 0c 50 57 e8 ea 0c 00 00 83 c4

<3>sr0: CDROM (ioctl) reports ILLEGAL REQUEST. spurious 8259A interrupt: IRQ7.

7.2.5. Using SysRq Kernel Magic

The SysRq Kernel Magic hotkey provides the ability to possibly communicate with a panicked kernel to dump information such as stack tracebacks of running tasks, the current program counter (PC) location, memory status, and so on. Refer to the /proc/sys/kernel/sysrq section in Chapter 3 for a detailed discussion of how to make use of this feature.

SysRq鍵提供了可能與panic的內核通信, 以轉儲信息, 如運行棧 tracebacks等, 當前程序計數器 (PC) 位置, 內存狀態等。有關如何使用此功能的詳細討論, 請參閱第3章中的/proc/sys/kernel/sysrq 部分。

7.2.6. Oops Reports

An Oops Report is basically just a dumping of information by the kernel when it encounters a serious problem. The problem can be a code related bug such as dereferencing a NULL pointer, accessing out of bounds memory, and so on. The Oops Report is generated by the kernel to help the end user debug, locate, and fix the problem. Sometimes when an oops occurs, the system may seem to continue running normally, but is likely to be in an unstable state. It is a good idea to save all your work and reboot as soon as possible.

Oops報告基本上只是內核在遇到嚴重問題時的信息輸出。問題可能是與代碼相關的 bug, 如訪問 NULL 指針、訪問超出內存邊界等。Oops報告由內核生成, 可以幫助最終用戶調試、定位和修復問題。有時, 當Oops發生時, 系統似乎會繼續正常運行, 但可能處於不穩定狀態。儘快保存您的所有工作和重新啓動，是一個好主意。

To demonstrate a real live kernel oops, we modified the kernel source to allow a user to trap the kernel at will. We discuss how we did this in the section, “Adding a Manual Kernel Trap,” which may be skipped if you are not interested in the somewhat simple modifications we made to the kernel. In the sections that follow it, “Examining an Oops Report” and “Determining the Failing Line of Code,” we will discuss the Oops Report generated by the manual kernel trap in detail. We will also illustrate how to find the exact line of source code that caused the kernel oops solely from the Oops Report, so you may wish to read the “Adding a Manual Kernel Trap” section after reading the other two sections.

爲了演示一個運行內核oops, 我們修改了內核源代碼, 讓用戶可以隨時捕獲內核。在 "添加手動內核陷阱" 一節中，我們討論瞭如何進行此操作, 如果您對我們對內核所做的一些簡單的修改不感興趣, 則跳過該操作。在下面的章節中, "檢查一個Oops報告" 和 "確定失敗的代碼行", 我們將詳細討論手動內核陷阱生成的Oops報告。我們還將說明如何找到源代碼的確切行, 導致內核的Oops報告, 所以你可能希望閱讀 "添加手動內核陷阱" 一節後閱讀其他兩個部分。

7.2.7. Adding a Manual Kernel Trap

For the purposes of easily demonstrating a kernel oops and how to examine the resulting information, we modified the kernel source code to add an interface in the /proc filesystem, which root could manipulate to force a trap in the kernel. We used the 2.6.2 kernel source downloaded directly from ftp.kernel.org on an AMD64 machine. Describing kernel source code in detail is beyond the scope of this book, but we’re including details on what we did for the curious reader who may be able to use this example as a very basic primer on how to get started with the kernel source.

爲了方便地演示內核Oops, 以及檢查結果信息, 我們修改了內核源代碼, 在/proc文件系統中添加一個接口, root用戶可以通過該操作來強制內核Oops。我們在AMD64 機上直接使用了從ftp.kernel.org 下載的2.6.2 內核源代碼。詳細描述內核源代碼超出了本書的範圍, 但我們包括了信息, 好奇的讀者可以使用這個例子作爲內核源碼的基本入門。

First, we decided on the interface we wanted to use. We decided to create a new file in the /proc filesystem called “trap_kernel.” The most logical place for it is in /proc/sys/kernel, as entries in this directory are very kernel-specific. Next, we needed to find where in the kernel source the addition of this new file would happen. Using the /proc/sys/kernel/sysrq file as an example, we located the source in kernel/sysctl.c[1]. When editing this file, we first needed to add a global variable that would be the storage for the value, which /proc/sys/kernel/trap_kernel represents. This was simply a matter of adding the following with the default value of 0 to the global declaration scope of the file:

首先, 我們決定了我們想要使用的接口。我們決定在/proc文件系統中創建一個名爲 "trap_kernel" 的一個新文件。對於它來說, 最合乎邏輯的地方是 /proc/sys/kernel, 因爲這個目錄中的條目是針對內核的。接下來, 我們需要找到內核源代碼中的哪個位置可以添加新文件。以 /proc/sys/kernel/sysrq 文件爲例, 我們找到了kernel/sysctl.c。編輯此文件時, 我們首先需要添加一個全局變量, 它存儲 /proc/sys/kernel/trap_kernel 的值。這僅僅是將默認值 0添加到文件的全局範圍聲明中:

[1] When referring to kernel source files, it’s common to have the pathname start at the /usr/src/linux directory. For example /usr/src/linux/kernel/sysctl.c would be commonly referred to as kernel/sysctl.c.

當查找內核源文件時, 通常是在/usr/src/linux 目錄中查找。例如,/usr/src/linux/kernel/sysctl. c 通常稱爲kernel/sysctl.c。

int trap_kernel_value = 0;

Next we needed a new structure that contained information for our new file. The code that we added to the kern_table array of structures is shown in bold as follows:

接下來, 我們需要一個新的結構體, 其中包含了新文件的信息。我們在kern_table結構體數組中添加的代碼以粗體顯示, 如下所示:

{

.ctl_name = KERN_PRINTK_RATELIMIT_BURST,

.procname = "printk_ratelimit_burst",

.data = &printk_ratelimit_burst,

.maxlen = sizeof(int),

.mode = 0644,

.proc_handler = &proc_dointvec,

{

.ctl_name = KERN_TRAP_KERNEL,

.procname = "trap_kernel",

.data = &trap_kernel_value,

.maxlen = sizeof (int),

.mode = 0644,

.proc_handler = &proc_dointvec_trap_kernel,

{ .ctl_name = 0 }

};

As is shown, we set the proc_handler to proc_dointvec_trap_kernel, which is basically a customized version of the real proc_dointvec. Without going into too much detail, proc_dointvec is used to handle user manipulation of an integer-based /proc file entry. /proc/sys/kernel/sysrq and /proc/sys/kernel/shmmni are examples of interfaces that work with integers. proc_dointvec is a “wrapper” function, which simply calls do_proc_dointvec with customized parameters:

如上所示, 我們將 proc_handler 設置爲 proc_dointvec_trap_kernel, 這基本上是一個自定義的真實 proc_dointvec 版本。在不深入詳細信息的情況下, proc_dointvec 是用於處理基於整數/proc文件項的用戶操作。/proc/sys/kernel/sysrq 和/proc /sys/kernel/shmmni 是使用整數的接口的示例。proc_dointvec 是一個 "包裝" 函數, 它只需使用自定義參數調用 do_proc_dointvec:

int proc_dointvec(ctl_table *table, int write, struct file *filp,

void __user *buffer, size_t *lenp)

{

return do_proc_dointvec(table,write,filp,buffer,lenp,

NULL,NULL);

}

The next step was to add the code for our proc_dointvec_trap_kernel customized handler, as follows:

下一步是爲我們的 proc_dointvec_trap_kernel 自定義處理程序添加代碼, 如下所示:

Code View: Scroll / Show All

int proc_dointvec_trap_kernel( ctl_table *table, int write,

struct file *filp,

void __user *buffer, size_t *lenp )

{

char c;

if ( write )

{

if ( ( get_user( c, (char *)buffer ) ) == 0 )

{

if ( c == '9' )

{

printk( KERN_ERR "trap_kernel: got '9'; trapping the kernel now\n" );

char *trap = NULL;

*trap = 1;

}

else

{

printk( KERN_ERR "trap_kernel: ignoring '%c'\n", c );

}

else

{

printk( KERN_ERR "trap_kernel: problem getting value\n" );

}

return do_proc_dointvec(table,write,filp,buffer,lenp,NULL,NULL);

}

The idea is that before calling do_proc_dointvec, we added some logic to determine if the user really is requesting a kernel trap. The logic is first to check the write flag, which is set to 1 if a write is being performed by the user, with, for example, the following command:

這個想法是在調用 do_proc_dointvec 之前, 我們添加了一些邏輯來確定用戶是否真的在請求內核陷阱。邏輯首先檢查寫標誌, 如果用戶正在執行寫入操作, 則設置爲 1, 例如以下命令:

linux> echo 1 > /proc/sys/kernel/trap_kernel

The write flag is set to 0 if a command such as this is being run:

如果正在運行這樣的命令, 則寫標誌設置爲 0:

linux> cat /proc/sys/kernel/trap_kernel

If a write is not being performed, nothing extra is done, and the real do_proc_dointvec function is called. The kernel carries on normally and returns the value of the trap_kernel_value global variable to the user. If a write is being performed, the buffer, which is a void pointer, is converted to the char c so that it can be examined. If converting this value is successful (that is, get_user returns 0), we check to see if the character is the number 9, which we chose as the trigger for trapping the kernel. If the char is a 9, we log a message indicating that the kernel will be trapped and then perform a simple NULL pointer dereference.

如果未執行寫操作, 則不進行任何額外操作, 並調用真正的 do_proc_dointvec 函數。內核正常進行, 並將 trap_kernel_value 全局變量的值返回給用戶。如果正在執行寫入操作, 則緩衝區 (即 void 指針) 將轉換爲 char c, 以便可以對其進行檢查。如果轉換此值成功 (即 get_user 返回 0), 我們將檢查字符是否爲數字 9, 我們選擇它作爲內核的trap觸發器。如果 char 是 9, 我們會記錄一條消息, 指示內核將被捕獲, 然後執行簡單的 NULL 指針引用。

If the char is not 9, we log the fact that we’re ignoring that value in terms of trapping the system and allow the kernel to perform the write and carry on normally.

如果 char 不是 9, 我們記錄的事實是, 我們忽略了這個值, 不trap系統, 並允許內核執行寫入和進行正常。

So if the user performs the following:

因此, 如果用戶執行以下操作:

linux> echo 3 > /proc/sys/kernel/trap_kernel

the action will be performed without a problem, and an entry in /var/log/messages such as the following should appear:

將執行該操作, 但不存在問題, 並且在/var/log/messages會出現一項(如下):

Feb 12 11:27:40 linux kernel: trap_kernel: ignoring '3'

If the user executes this command:

如果用戶執行此命令:

linux> echo 9 > /proc/sys/kernel/trap_kernel

an Oops should immediately happen, which we will discuss in the next section.

一個Oops應該馬上發生, 我們將在下一節討論。

7.2.8. Examining an Oops Report

Examining an Oops Report is not an exact science and requires a bit of ingenuity and experience. To know the basic steps to take and to understand how things generally work is a great start and is the goal of this section. The oops dumps are a little different between 2.4.x and 2.6.x kernels.

研究一個Oops報告並不是一門精確的科學, 需要一些智慧和經驗。知道採取的基本步驟和理解事情通常是如何工作的，是一個偉大的開始。這是本節的目標。在2.4.x 和2.6.x 內核之間的轉儲有點不同。

7.2.8.1. 2.6.x Kernel Oops Dumps

Oops dumps in a 2.6.x kernel can usually be examined as is without the need to process them through ksymoops, as is needed for 2.4.x oops. The /proc/sys/kernel/trap_kernel “feature” we added to manually force a kernel oops results in the oops dump shown as follows:

在一個2.6.x 內核Oops轉儲通常可以檢查, 因爲是不需要通過 ksymoops處理他們。但是2.4.xOops需要。我們添加到/proc/sys/kernel/trap_kernel的手動強制內核Oops轉儲的功能顯示如下:

Code View: Scroll / Show All

Oops: 0002 [1]

CPU 0

Pid: 2680, comm: bash Not tainted

RIP: 0010:[<ffffffff8013ae64>]

<ffffffff8013ae64>{proc_dointvec_trap_kernel+84}

RSP: 0018:00000100111c9ea8 EFLAGS: 00010216

RAX: 0000000000000031 RBX: 00000100111c8000 RCX: ffffffff80413210

RDX: 0000010011cd9280 RSI: ffffffff80413970 RDI: 0000000000000000

RBP: 0000002a95e59000 R08: 0000000000000033 R09: 000001001f57dbc0

R10: 0000000000000000 R11: 0000000000000175 R12: 0000000000000001

R13: 00000100111c9ee8 R14: 000001001115b980 R15: ffffffff803a1390

FS: 00000000005614a0(0000) GS:ffffffff8045bc40(0000)

knlGS:0000000000000000

CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b

CR2: 0000000000000000 CR3: 0000000000101000 CR4: 00000000000006a0

Process bash (pid: 2680, stackpage=10011cda280)

Stack: 0000002a95e59000 0000000000000002 ffffffff803a1390

000001001115b980

0000000000000002 0000000000000001 0000002a95e59000

ffffffff8013a6e8

0000000000000002 0000000000000000

Call Trace:<ffffffff8013a6e8>{do_rw_proc+168}

<ffffffff8016baf4>{vfs_write+228} <ffffffff8016bc09>{sys_write+73}

<ffffffff80111830>{system_call+124}

Code: c6 04 25 00 00 00 00 01 eb 23 66 90 0f be f2 48 c7 c7 c0 9f

RIP <ffffffff8013ae64>{proc_dointvec_trap_kernel+84} RSP

<00000100111c9ea8>

CR2: 0000000000000000

The dump was taken directly from /var/log/messages, and we manually removed the preceding <timestamp> <hostname> kernel: markings on each line for easier reading.

轉儲直接取自/var/log/messages, 我們手動刪除標記每行的<timestamp><hostname>kernel: 以便於閱讀.

If you didn’t skip the “Adding a Manual Kernel Trap” section, the problem shown by the Oops Report will probably seem pretty obvious. But let’s pretend that we have no idea where this trap came from and it’s something that needs to be fixed.

如果您沒有跳過 "添加手動內核陷阱" 部分, 則Oops報告顯示的問題可能看起來很明顯。但讓我們假裝不知道這個trap是從哪裏來的, 這是需要修補的東西。

Let’s first analyze the first line:

讓我們先分析第一行:

Oops: 0002 [1]

Initially, this looks really cryptic and useless, but it actually contains a great deal of information! To begin with, we know that we’re in an oops situation, meaning that the kernel has encountered an unexpected problem. The “0002” that follows “Oops:” is a hexadecimal number that represents the page fault error code. By decoding this, we can determine exactly what the error condition was. To decode it, we first need to convert it to binary—hexadecimal 2 is binary 10. Now we need to compare this value against Table 7.1 to decode the meaning.

最初, 這看起來真的是神祕和無用的, 但它實際上包含了大量的信息!首先, 我們知道我們處於糟糕的境地, 這意味着內核遇到了一個意想不到的問題。下面的 "0002" 是一個十六進制數字, 表示頁錯誤代碼。通過解碼, 我們可以準確地確定錯誤條件是什麼。爲了解碼它, 我們首先需要將它轉換爲二進制，十六進制2是二進制10。現在, 我們需要將此值與表7.1 進行比較以解碼含義。

Table 7.1. Page Fault Error Codes.
	Value
Bit	0	1
0	No page found	Protection fault
1	Read	Write
2	Kernel-mode	User-mode
3[2]	Fault was not an instruction fetch	Fault was an instruction fetch

[2] Bit 3 is defined on the x86-64 architecture but not on the i386 architecture.

Using Table 7.1, we know that binary 10 means that no page was found, a write was attempted in kernel-mode, and the fault was not due to an instruction fetch (remember that this Oops Report was taken from an AMD64 machine). So from this information we know that a page was not found when doing a write operation within the kernel.

使用表 7.1, 我們知道二進制10表示沒有找到任何頁面, 在內核模式下嘗試寫入, 而錯誤不是由於指令提取 (請記住, 這個Oops報告是從 AMD64 機器中提取的)。因此, 從這些信息中我們知道在內核中執行寫操作時找不到頁面。

The next piece of information on the first line is the [1]. This is the die counter that basically keeps track of the number of oopses that have occurred since the last reboot. In our case, this is the first oops we’ve encountered.

第一行的下一條信息是 [1]。這是一個模計數器, 它基本上跟蹤自上次重新啓動後發生的 Oops 的數量。在我們的例子中, 這是我們遇到的第一個 "Oops"。

The next line is CPU 0 and indicates which CPU performed the instruction that caused the fault. In this case, my system only has one CPU, so 0 is the only possible value. The next line is:

下一行是 cpu 0, 表示哪個 cpu 執行導致故障的指令。在這種情況下, 我的系統只有一個 CPU, 所以0是唯一可能的值。下一行是:

Pid: 2680, comm: bash Not tainted

The Pid indicates which user-land process ID initiated the problem, and comm: tells us that the process name was bash. This makes sense because the command I issued was redirecting the output of an echo command, which is all handled by the shell. Not tainted tells us that our kernel has not been tainted by any modules not under the GPL and/or forcefully loaded. The next line in the oops dump is:

Pid 指示哪些用戶空間進程 ID 引發了問題, comm: 告訴我們進程名稱是 bash。這是有意義的, 因爲我發出的命令是重定向echo命令的輸出, 這是由 shell 處理的。Not tainted 告訴我們, 我們的內核的所有模塊都在 GPL 下，並且沒有強行加載。Oops轉儲的下一行是:

RIP: 0010:[<ffffffff8013ae64>]

<ffffffff8013ae6>{proc_dointvec_trap_kernel+84}

RIP is the name of the instruction pointer on the AMD64 architecture. On 32-bit x86 systems, it is called the EIP instead and, of course, is 32 bits long rather than 64 bits long. (Though it somewhat fits in this situation to have RIP stand for Rest In Peace, this is not the intended meaning here.) The “0010” is a dumping of the CS register. The dumping of the CS register shows us that the current privilege level (CPL) was 0. This number corresponds to which permission or ring level the trap occurred in.

RIP 是 AMD64 體系結構上指令指針的名稱。在32位 x86 系統上, 它被稱爲 EIP, 當然, 它是32位長而不是64位長。(雖然在某種程度上，讓RIP代表Rest In Peace，更適合這種情況，但這裏不是這個意思)。"0010" 是寄存器CS中的值。寄存器CS顯示, 目前的特權 (CPL) 是0。此數字對應於陷阱發生的權限或環級別。

Note: Ring level is a term used to refer to what permissions code has when running on a CPU. Ring 0 has unlimited access, therefore kernel mode runs at ring 0. Ring level 3 is for user mode processes. On Linux, ring levels 1 and 2 are unused.

注意: 環級別是指在 CPU 上運行時使用的權限代碼的術語。環0有無限的訪問, 因此內核模式運行在環0。環級別3用於用戶模式進程。在 Linux 上, 環級別1和2不使用。

The [<ffffffff8013ae64>] is a dumping of the RIP register. This means that the instruction pointer was pointing to the instruction at memory address 0xffffffff8013ae64 at the time the trap occurred. Note the RIP address is printed out again by another function, which the kernel calls to perform more work.

[<ffffffff8013ae64>] 是RIP 寄存器的內容。這意味着在發生陷阱時，指令指針指向內存地址0xffffffff8013ae64的指令。注意 RIP 地址由另一個函數打印出來, 內核調用它來執行更多的工作.

The {proc_dointvec_trap_kernel+84} is the result of the kernel translating the RIP address into a more human readable format. proc_dointvec_trap_kernel is the name of the function in which the RIP lies. The +84 means that the RIP is at an offset of decimal 84 into the proc_dointvec_trap_kernel function. This value is extremely useful in determining exactly what caused the trap, as we shall see in the “Determining the Failing Line of Code” section.

{proc_dointvec_trap_kernel+84} 是內核將 RIP 地址轉換爲更人性化的可讀格式。proc_dointvec_trap_kernel 是 RIP 所在函數的名稱。+84 表示RIP在 proc_dointvec_trap_kernel 函數的十進制84的偏移地址。此值在確定導致trap的確切原因方面非常有用, 我們將在 "確定失敗代碼行" 一節中看到這一點。

The next several lines are a dumping of the registers and their contents:

RSP: 0018:00000100111c9ea8 EFLAGS: 00010216

RAX: 0000000000000031 RBX: 00000100111c8000 RCX: ffffffff80413210

RDX: 0000010011cd9280 RSI: ffffffff80413970 RDI: 0000000000000000

RBP: 0000002a95e59000 R08: 0000000000000033 R09: 000001001f57dbc0

R10: 0000000000000000 R11: 0000000000000175 R12: 0000000000000001

R13: 00000100111c9ee8 R14: 000001001115b980 R15: ffffffff803a1390

FS: 00000000005614a0(0000) GS:ffffffff8045bc40(0000)

knlGS:0000000000000000

CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b

CR2: 0000000000000000 CR3: 0000000000101000 CR4: 00000000000006a0

Describing each register in detail and what it is used for is beyond the scope of this book[3]. It suffices to say for now that the values stored in the registers could be very useful when examining the assembly code surrounding the trapping instruction.

詳細描述每個寄存器和它的用途是超出本書的範圍 [3]。現在可以說，在檢查trap指令周圍的彙編代碼時，存儲在寄存器中的值可能非常有用。

[3] For detailed information on the registers, I recommend reading the AMD64 Architecture Manuals found at AMD’s Web site. Similarly for other architectures, manuals describing detailed hardware information can usually be found at the vendor’s Web site.

有關寄存器的詳細信息, 我建議閱讀 AMD 網站上的 AMD64 體系結構手冊。類似於其他體系結構, 通常可以在供應商的網站上找到描述詳細硬件信息的手冊。

The next line of interest is下一行的興趣是

Process bash (pid: 2680, stackpage=10011cda280)

The Process bash (pid: 2680 is a reiteration of what was dumped on the third line. stackpage=10011cda280 shows us the kernel stack page that is involved in this process.

進程 bash (pid: 2680 是對第三行上的內容的重複). stackpage=10011cda280 向我們展示了此進程中涉及的內核棧頁。

The next few lines dump out a predefined number of 64-bit words from the stack. In the case of AMD64, this number is set to 10.

接下來的幾行將輸出棧中預定義的64位字數。在 AMD64 的情況下, 此數字設置爲10。

Stack: 0000002a95e59000 0000000000000002 ffffffff803a1390

000001001115b980

0000000000000002 0000000000000001 0000002a95e59000

ffffffff8013a6e8

0000000000000002 0000000000000000

The values are not too important at first glance. Depending on the assembly instructions surrounding the trap and the context of the particular problem, the values shown here may be needed. They are basically dumped here with an “in case they’re needed” purpose. The next lines in the oops dump that are of interest are:

這些值乍一看並不重要。根據圍繞trap的彙編指令和特定問題的上下文, 此處顯示的值可能是必需的。他們輸出在這裏,是防備“萬一有需要的”。在oops轉儲中的有個的下一行是:

Call Trace:<ffffffff8013a6e8>{do_rw_proc+168}

<ffffffff8016baf4>{vfs_write+228}

<ffffffff8016bc09>{sys_write+73}

<ffffffff80111830>{system_call+124}

Call Trace shows a list of the last few functions that were called before the trap occurred. Now we know that the execution path looked like what is shown in Figure 7-1.

Call Trace顯示在發生trap之前調用的最後幾個函數的列表。現在我們知道執行路徑看起來像圖7-1 所示。

Figure 7-1. Call trace in kernel.

[View full size image]

7.2.9. Determining the Failing Line of Code

Now that we know a bit more about the trap and the characteristics of it, we need to know what actually caused it. In almost all cases, the cause is a programming error, so the key to answering that question is first knowing where in the source code the problem occurs. We know the function name, and we know the offset of assembly instructions into that function, so the first step is to find where the function proc_dointvec_trap_kernel is defined. Rather than scouring through the hundreds of source files that comprise the Linux kernel, using a tool called cscope is far easier (see the “Setting up cscope to Index Kernel Sources” section for more information). Plugging the trapping function name into Find this C symbol: we get the screen shown in Figure 7.2.

現在我們已經對Trap和它的特徵，瞭解得足夠多了, 我們需要知道究竟是什麼造成的。在幾乎所有情況下, 原因都是編程錯誤, 因此回答該問題的關鍵是首先知道源代碼中的問題發生的位置。我們知道函數名, 我們知道彙編指令在函數中的偏移量, 所以第一步是查找定義函數 proc_dointvec_trap_kernel 的位置。與其在組成 Linux 內核的數以百計的源文件中進行查找, 不如使用稱爲 cscope 的工具更容易 (請參閱 "設置 cscope 索引內核源" 部分以瞭解更多信息)。將函數名稱輸入查找此 C 符號: 我們得到圖7.2 所示的屏幕。

Figure 7.2. cscope of kernel code.

[View full size image]

If you read the “Adding a Manual Kernel Trap” section, the results just shown will be very familiar. By typing “0,” “1,” or “2,” the respective file will be loaded and the cursor position will be pointed directly at the proc_dointvec_trap_kernel symbol. For now, we just wanted to find out what source file the function appeared in. We now know that it is /usr/src/linux-2.6.2-2/kernel/sysctl.c.

Now we need to somehow translate the offset into the function of decimal 84 into a source line in the function in sysctl.c.

如果您閱讀了 "手動添加內核陷阱" 部分, 則非常熟悉顯示的結果。通過鍵入 "0"、"1" 或 "2", 相應的文件將被加載, 遊標位置將直接指向 proc_dointvec_trap_kernel 符號。現在, 我們只想找出函數出現的源文件。我們現在知道, 它是/usr/src/linux-2.6. 2-2/kernel/sysctl.c。現在, 我們需要以某種方式將proc_dointvec_trap_kernel+84 轉換爲sysctl.c 中函數的源代碼行。

First we have to get debug symbols built into the sysctl.o object. For the 2.6.x kernel source, do this with the following set of steps.

首先, 我們必須得到的調試符號內置 sysctl. o 對象。對於2.6.x 內核源, 請使用下面的步驟執行此操作。

1.	cd /usr/src/linux-2.6.2 Everything we’ll do here must be run out of the top level of the kernel source tree.
2.	rm kernel/sysctl.o Removing this file forces it to be recompiled.
3.	export CONFIG_DEBUG_INFO=1 Searching /usr/src/linux-2.6.2/Makefile for CONFIG_DEBUG_INFO shows that if this is set, the -g compile flag will be added to the compilation.
4.	make kernel/sysctl.o This will recompile the sysctl.c file with the -g parameter included.
5.	If you happened to notice the size of sysctl.o before and after performing the steps here, you’ll notice it’s now much larger. This is expected because it should now contain several extra debug symbols.
6.	Now for the critical part of converting the object file (sysctl.o) into a listing of assembly instructions mixed with the C source code. This is accomplished with the objdump command. Run it like this: objdump -d -S kernel/sysctl.o > kernel/sysctl.dump

Note: As documented in the objdump(1) man page, the -d option will disassemble the object, and the -S option will intermix high-level source code with assembly if possible.

注意: 如 objdump (1) 在線手冊中所述,-d 選項將對對象進行反彙編,-s 選項將在可能的情況下將高級語言源代碼與彙編指令混合在一起輸出。

We’re now ready to examine the assembly/source dump file and search for our decimal 84 offset. First open sysctl.dump created by the objdump command and search for the beginning of the proc_dointvec_trap_kernel function. It will look something like the following:

現在, 我們已經準備好檢查彙編/源代碼輸出文件, 並搜索我們的十進制84偏移量。首先打開由 objdump 命令創建的 sysctl, 並搜索 proc_dointvec_trap_kernel 函數的開頭。將看到以下內容:

Code View: Scroll / Show All

0000000000000fb0 <proc_dointvec_trap_kernel>:

int proc_dointvec_trap_kernel(ctl_table *table, int write, struct file *filp,

void __user *buffer, size_t *lenp)

{

fb0: 48 83 ec 38 sub $0x38,%rsp

char c;

if ( write )

fb4: 85 f6 test %esi,%esi

fb6: 48 89 6c 24 10 mov %rbp,0x10(%rsp,1)

fbb: 4c 89 64 24 18 mov %r12,0x18(%rsp,1)

fc0: 4c 89 6c 24 20 mov %r13,0x20(%rsp,1)

fc5: 4c 89 74 24 28 mov %r14,0x28(%rsp,1)

As is shown here, the start of the function is at offset hexadecimal fb0. Because offsets are referenced in hex, we need to convert the decimal 84 to hex, which is 0x54. Next we add 0x54 to 0xfb0 to find the location reported in the oops dump. The offset we need to look for in sysctl.dump is 0x1004.

如下所示, 函數的開始位置爲偏移十六進制 fb0。因爲偏移是用十六進制顯示的, 所以我們需要將十進制84轉換爲十六進制, 即0x54。接下來, 我們添加0x54 到 0xfb0, 以找到Oops報告的位置。我們需要在 sysctl.dump 中查找的偏移量是0x1004。

Tip: A quick way of doing calculations on a Linux machine is to run “gdb” and enter commands such as print 324 * 434. To do hex calculations, use a command such as print /x 0x54 + 0xfb0. Also note that print can be shortened to p.

提示: 在 Linux 機器上進行計算的一個快速方法是運行 "gdb" 並輸入命令, 如打印 324 * 434。要執行十六進制計算, 請使用命令 (如print /x 0x54 + 0xfb0)。另外請注意, 打印可以縮短到 p。

Offset 0x1004 in sysctl.dump is shown as follows:

偏移0x1004 在sysctl.dump中的顯示如下:

Code View: Scroll / Show All

{

if ( ( get_user( c, (char *)buffer ) ) == 0 )

fe5: 48 89 c8 mov %rcx,%rax

fe8: e8 00 00 00 00 callq fed <proc_dointvec_trap_kernel+0x3d>

fed: 85 c0 test %eax,%eax

fef: 75 32 jne 1023 <proc_dointvec_trap_kernel+0x73>

{

if ( c == '9' )

ff1: 80 fa 39 cmp $0x39,%dl

ff4: 75 1a jne 1010 <proc_dointvec_trap_kernel+0x60>

{

printk( KERN_ERR "trap_kernel: got '9'; trapping the kernel now\n" );

ff6: 48 c7 c7 00 00 00 00 mov $0x0,%rdi

ffd: 31 c0 xor %eax,%eax

fff: e8 00 00 00 00 callq 1004 <proc_dointvec_trap_kernel+0x54>

char *trap = NULL;

*trap = 1;

1004: c6 04 25 00 00 00 00 movb $0x1,0x0

100b: 01

100c: eb 23 jmp 1031 <proc_dointvec_trap_kernel+0x81>

100e: 66 data16

100f: 90 nop

}

else

{

printk( KERN_ERR "trap_kernel: ignoring '%c'\n", c );

1010: 0f be f2 movsbl %dl,%esi

1013: 48 c7 c7 00 00 00 00 mov $0x0,%rdi

101a: 31 c0 xor %eax,%eax

Looking at offset 0x1004, we see the assembly instruction movb $0x1, 0x0, which means to store the value 1 into the memory address 0x0. On x86 and AMD64 hardware, this produces a page fault that results in the trap we observed.

查看偏移 0x1004, 我們看到彙編指令 movb 0x1, 0x0, 這意味着將值1存儲到內存地址0x0 中。在 x86 和 AMD64 硬件上, 這會產生一個頁面錯誤, 導致我們觀察到的Trap。

Immediately above the assembly instruction is the C source code that resulted in the generation of this assembly instruction.

在彙編指令之上的是生成此彙編指令的 C 源代碼。

char *trap = NULL;

*trap = 1;

This code is a blatant example of the classic NULL pointer dereferencing programming error. We’ve found the cause of the trap! Of course, this is the code I added as discussed in the “Adding a Manual Kernel Trap” section.

此代碼是經典的訪問NULL 指針錯誤的示例。我們找到了Trap的原因!當然, 這是我添加的代碼, 正如在 "添加手動內核陷阱" 一節中所討論的那樣。

Determining the line of code does not always go this smoothly. Occasionally, the calculated offset is nowhere to be found in the disassembled object output. One of the main reasons for this is the use of a different version of the compiler to generate the listing than was used to originally compile the object in which the oops occurred. It is extremely important to use the exact same compiler version. Different versions of a compiler, even minor release changes, can and will change the ordering of the assembly instructions. Different optimization levels and/or options will also change the generated assembly, even when the same compiler level is used. At the assembly level, a single different or relocated instruction will make definitively locating the trapping instruction very difficult. With some ingenuity, though, even if the compiler levels are slightly different, one can get close enough to the trapping area of code to discover the fault.

確定代碼行並不總是那麼順利。有時, 在反彙編對象輸出中找不到計算出的偏移量。其中一個主要原因是使用不同版本的編譯器生成彙編程序,。使用版本完全相同的編譯器非常重要。編譯器的不同版本 (即使是次要版本的改變)將更改彙編指令的順序。即使使用相同的編譯器, 不同的優化級別和/或選項也會改變生成的彙編程序。在彙編級別, 單個不同的或重新定位的指令將使trap指令的定位非常困難。不過, 儘管編譯器稍有不同, 但只要有一些智慧, 就可以接近代碼的trap區域, 以發現錯誤。

7.2.9.1. 2.4.x Kernel Oops Dumps

Analyzing kernel oops dumps in a Linux 2.4.x kernel is slightly different from doing so in a 2.6.x kernel. The main reason for this is the addition of the kallsyms feature/patch to the 2.6 mainline kernel source. The kallsyms feature provides a listing of all kernel symbols that the kernel itself can use to translate a hexadecimal address into a human readable symbol name.

在 Linux 2.4.x 內核中分析內核的Oops轉儲與在2.6.x 內核中進行此操作稍有不同。主要原因是 kallsyms 功能添加到2.6 主線內核源。kallsyms 功能提供了內核可以用來將十六進制地址轉換爲可讀符號名稱的所有內核符號的列表。

Note that many 2.4-based distributions have backported the kallsyms feature to their customized 2.4-based kernels. This means that if an oops occurs in these distributions, the dumped data is automatically formatted. If your distribution does not have the kallsyms patch or you are running a 2.4.x kernel as downloaded from kernel.org, you will need to manually format the oops message before it is useful to anyone. To do this, run the utility ksymoops with the appropriate parameters as documented in the ksymoops(8) man page.

請注意, 許多基於2.4 的發行版已將 kallsyms 功能合併到自定義的基於2.4 的內核。這意味着, 如果在這些發行版中發生了錯誤, 則將自動格式化輸出的數據。如果您的發行版沒有 kallsyms 功能, 或者您運行的是從 kernel.org 下載的2.4.x 內核, 則需要手動格式化Oops信息。爲此, 請使用 ksymoops 工具。具體參數可以參閱ksymoops(8)的幫助手冊。

7.2.10. Kernel Oopses and Hardware

A kernel oops does not always mean that a software error was found in the kernel. In fact, hardware actually fails quite often, so it should never be ruled out as the possible cause of an oops. The question then becomes how does one determine if the oops is caused by faulty hardware. Here are some clues that can point to faulty hardware:

內核Oops並不總是意味着在內核中發現了軟件錯誤。事實上, 硬件也經常失敗, 所以它不應該被排除。然後問題就變成如何確定, Oops是由硬件故障造成的。以下是一些可以指向硬件故障的線索:

Oopses occurring in places where it is almost impossible for them to occur in after examining the source code around the trap.
Recurring oopses that don’t always happen in the same place. Software bugs are almost always reproducible, but faulty hardware, especially bad RAM, can cause strange things to happen in seemingly random places.
Sudden start of oopses. If the operating system has been running fine with little or no changes to it, and oopses all of a sudden start occurring.
Hard machine lockups where nothing is displayed to the screen, SysRq magic hotkey does nothing, and only a hard reboot can be done.

If the hardware is suspected, tests should be performed immediately starting with the RAM, unless there is reason to suspect some other piece of hardware. Many servers have built in diagnostic programs that can be accessed from the BIOS menu. These should be run first. Soft ware testing programs such as memtest86 (available at www.memtest86.org) should also be run to examine the RAM.

如果懷疑硬件, 則應立即從 RAM 開始進行測試, 除非有理由懷疑其他硬件。許多服務器都內置了可從 BIOS 菜單訪問的診斷程序。這些應該先運行。軟件測試程序, 如 memtest86 (可從 www.memtest86.org下載) 也應該運行, 以檢查 RAM。

Note: memtest86 is historically meant for 32-bit x86-based hardware. The need for the support of this software on AMD64 hardware was great, and this was one of the reasons for the spin-off creation of memtest86+ (available at www.memtest.org). memtest86+ fully supports all AMD Opteron chips.

注: memtest86 在歷史上是針對32位 x86-based 硬件開發的。在 AMD64 硬件上支持此軟件的很有必要, 這也是 memtest86+ (可在 www.memtest.org 上找到) 存在的原因之一。memtest86+ 完全支持所有的 AMD Opteron芯片。

The fsck (file system check) utility should also be run on all hard drive partitions to which the server has access. Sometimes a corruption can occur on a filesystem, which can lead to corruption in libraries or executables. This corruption can lead to instructions within the code becoming scrambled, therefore resulting in very bad things happening. The fsck utility will detect and fix any corruptions that may exist.

還應在服務器的所有可訪問硬盤分區上運行 fsck (文件系統檢查) 實用程序。有時, 文件系統可能發生損壞, 這可能導致庫或可執行文件的損壞。這種損壞會導致代碼中的指令變得混亂, 從而導致非常糟糕的事情。fsck 實用程序將檢測並修復可能存在的任何損壞。

7.2.11. Setting up cscope to Index Kernel Sources

Most Linux distributions include the cscope package. If your distribution does not, you can easily locate it for download on the Internet. cscope is a utility that scans through a defined set of source code files and builds a single index file. You can then use the cscope interface to search for function, variable, and macro definitions, listings of which functions call a particular function, listings of what files include a particular header file, and more. It’s an extremely useful tool, and we highly recommend setting it up if you plan on looking through any amount of kernel source. It isn’t just limited to indexing kernel source; you can set it up to “scope” through any set of source files you wish!

大多數 Linux 發行版包括 cscope 包。如果您的發行版沒有, 您可以在互聯網上下載。cscope 是一個實用程序, 它掃描一組已定義的源代碼文件, 並生成一個索引文件。然後, 您可以使用 cscope 接口搜索函數、變量和宏定義, 其中函數調用特定函數的列表、包含特定頭文件的文件的列表等。這是一個非常有用的工具, 如果您計劃查看任何內核源代碼，我們強烈建議設置它。它不僅限於索引內核源;您可以索引任何源代碼文件!

Before running cscope to retrieve symbol information, a symbols database must be built. Assuming your kernel source tree is in /usr/src/linux-2.6.2, the following command will create the symbols database in the file /usr/src/linux-2.6.2/cscope.out:

在運行 cscope 以檢索符號信息之前, 必須生成一個符號數據庫。假設您的內核源代碼在/usr/src/linux -2.6.2, 下面的命令將在文件/usr/src/linux-2.6. 2/cscope.out中創建符號數據庫:

find /usr/src/linux-2.6.2 \( -name "[A-Za-z]*.[CcHh]" -o -name "*.[ch]pp"

-o -name "*.[CH]PP" -o -name "*.skl" -o -name "*.java" \) -print |

cscope -bku -f /usr/src/linux-2.6.2/cscope.out -i -

The cscope parameters to take note of are -b for build the symbol database (or cross-reference as it is referred to in the man page) and -k for “kernel mode,” which ensures that the proper include files are scoured.

cscope 參數-b爲生成符號數據庫 (或cross-reference表示交叉引用幫助手冊) 和-k 用於 "內核模式", 以確保正確的頭文件被索引。

Once the database is built, searching it is simply a matter of running this command:

建立數據庫後, 搜索它只是一個命令的問題:

cscope -d -P /usr/src/linux-2.6.2 -p 20 -f /usr/src/linux-2.6.2/cscope.out

The following simple Korn Shell script can be used to handle all of this for you.

下面的簡單Korn Shell 腳本可以用來處理所有這些問題。

Code View: Scroll / Show All

#!/bin/ksh

# Simple script to handle building and querying a cscope database for

# kernel source with support for multiple databases.

build=0

dbpath="linux"

force=0

function usage {

echo "Usage: kscope [-build [-force]] [-dbpath <path>]"

}

while [ $# -gt 0 ]; do

if [ "$1" = "-build" ]; then

build=1

elif [ "$1" = "-dbpath" ]; then

shift

dbpath=$1

elif [ "$1" = "-force" ]; then

force=1

else

usage

exit 1

shift

done

if [ $build -eq 1 ]; then

if [ -f "/usr/src/$dbpath/cscope.out" -a $force -ne 1 ]; then

echo "cscope database already exists. Use '-force' to overwrite."

exit 1

echo "Building /usr/src/$dbpath/cscope.out ..."

find /usr/src/$dbpath \( -name "[A-Za-z]*.[CcHh]" -o -name "*.[ch]pp"

-o -name "*.[CH]PP" -o -name "*.skl" -o -name "*.java" \) -print |

cscope -bku -f /usr/src/$dbpath/cscope.out -i -

echo "Done."

else

if [ ! -f "/usr/src/$dbpath/cscope.out" ]; then

echo "cscope database (/usr/src/$dbpath/cscope.out) not found."

exit 1

cscope -d -P /usr/src/$dbpath -p 20 -f /usr/src/$dbpath/cscope.out fi

7.3. Conclusion

Unfortunately, system crashes and hangs do happen. With the knowledge and tools presented in this chapter, you should be armed with the capability to at least know where to gather the diagnostic data needed for others to analyze and determine the cause of the problem. You can use the tips and suggestions presented in Chapter 1, “Best Practices and Initial Investigation” in conjunction with the information in this chapter to most efficiently and effectively get a solution to your problem.

不幸的是, 系統崩潰和掛起確實發生。通過本章中介紹的知識和工具, 您應該具備在何處收集診斷數據的能力, 以便其他人分析和確定問題的原因。您可以使用本章 1 "最佳做法和初步調查" 中介紹的提示和建議, 並結合本章中的信息, 以最有效地解決您的問題。

For the more advanced or curious reader, this chapter presents sufficient information for one to dive deeply into the problem in an attempt to solve it without help from others. This is not always the best approach depending on your situation, so caution should be exercised. In either case, with the right tools and knowledge, dealing with system problems on Linux can be much easier than on other operating systems.

對於更高級或好奇的讀者, 本章提供了足夠的信息, 讓讀者深入探究這個問題, 並試圖在沒有幫助的情況下解決它。這並不總是最好的方法(取決於你的情況), 所以應該謹慎。無論在如何, 使用正確的工具和知識, 處理 Linux 上的系統問題比在其他操作系統上容易得多。

Chapter 7. Linux System Crashes and Hangs

Change default rpm build root directory

make qemu work with cx5 passthrough in centos

send 32768 different MAC addresses to the server

configure switchdev mode to run hardware offload test

Backport egdev mechanism to redhat kernel 3.10.0-1127

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結