6.6. Execution

The GDB execution commands give you full control over a process. Of course, these commands do not work when using a core file because there is no live process to control. This section of the chapter will introduce some of the most commonly used commands for controlling a live program. It will also present some of the more advanced and lesser known tips and tricks that can make your debugging session more efficient.

GDB 执行命令使您可以完全控制进程。当然, 当使用core文件时, 这些命令不起作用, 因为没有要控制的实时进程。本章的这一部分将介绍一些最常用的用于控制实时程序的命令。它还将介绍一些更高级和更鲜为人知的提示和技巧，可以使您的调试更有效。

6.6.1. The Basic Commands

The following table summarizes the most common and most useful execution control commands:

下表总结了最常用和最有用的执行控制命令:

Note: <N> represents a number. The number instructs GDB to run the command a certain number of times. For example, next 5 will perform the next command 5 times.

注意:<N>表示一个数字。数字指示 GDB 运行命令的次数。例如, 接下来的5将执行下一命令5次.

The commands in Table 6.2 are pretty self-explanatory, but there are some settings you can use in GDB that affect the way some of these commands work. The next sections describe some of these in more detail.

表6.2中的命令非常明显，但是您可以在GDB中使用一些设置来影响其中一些命令的工作方式。接下来的部分将更详细地介绍其中的一些内容。

Table 6.2. Basic GDB Execution Commands.
Action	Command	Notes
Execute the next source code line and do not go into functions	next <N>	Requires debug symbols and source code. The optional argument sets the number of lines of source code to step through. Functions will be skipped over (but called under the covers).
Execute the next source code line and do go into functions	step <N>	Requires debug symbols and source code. The optional argument sets the number of lines of source code to step through.
Execute the next assembly instruction and do not go into functions	nexti <N>	The optional argument sets the number of instructions to step through. Function will be skipped over (but called under the covers).
Execute the next assembly instruction and do go into functions	stepi <N>	The optional argument sets the number of instructions to step through.
Continue full execution	continue
Continue execution until the current stack frame returns	finish
Continue execution at a specific address	jump <address>	Use with caution. Jumping directly to an address can cause unexpected behavior given that the registers may not have been set properly to run the target instruction.
Continue execution until a source line or location greater than the current location or the location specified	until
Call a function in the program	call <function>
Manually stop execution of the program	CTRL-C

6.6.1.1. Notes on stepi

The stepi command allows you to execute a single assembly instruction. You can also execute any number of instructions by specifying that number as an argument to stepi. Stepping through each and every assembly instruction can often reveal very low-level actions that normally take place unnoticed in a program. Again using the dyn program as an example (what dyn itself actually does is not important for this demonstration), let’s use the stepi command to step into the call to printf:

stepi 命令允许您执行单个汇编指令。还可以通过将该数字指定为 stepi 的参数来执行任意数量的汇编指令。逐句通过每个汇编指令通常可以显示程序中不被注意的低级操作。再次使用 dyn 程序作为示例 (dyn 本身实际上对此演示并不重要), 让我们使用 stepi 命令逐步进入对 printf 的调用:

Code View: Scroll / Show All

(gdb) break main

Breakpoint 1 at 0x804841c: file dyn_main.c, line 6.

(gdb) run

Starting program: /home/dbehman/book/code/dyn

Breakpoint 1, main () at dyn_main.c:6

6 void *dlhandle = NULL;

(gdb) next

9 printf( "Dynamically opening libdyn.so ...\n" );

(gdb) stepi

0x08048426 9 printf( "Dynamically opening libdyn.so ...\n"

);

(gdb) stepi

0x0804842b 9 printf( "Dynamically opening libdyn.so ...\n"

);

(gdb) stepi

0x08048330 in ?? ()

(gdb) stepi

0x08048336 in ?? ()

(gdb) stepi

0x0804833b in ?? ()

(gdb) stepi

0x08048300 in ?? ()

(gdb) stepi

0x08048306 in ?? ()

(gdb) stepi

0x4000d280 in _dl_runtime_resolve () from /lib/ld-linux.so.2

(gdb) stepi

0x4000d281 in _dl_runtime_resolve () from /lib/ld-linux.so.2

(gdb) stepi

0x4000d282 in _dl_runtime_resolve () from /lib/ld-linux.so.2

(gdb) stepi

0x4000d283 in _dl_runtime_resolve () from /lib/ld-linux.so.2

(gdb) stepi

0x4000d287 in _dl_runtime_resolve () from /lib/ld-linux.so.2

(gdb) stepi

0x4000d28b in _dl_runtime_resolve () from /lib/ld-linux.so.2

(gdb) stepi

0x4000cf90 in fixup () from /lib/ld-linux.so.2

(gdb) stepi

0x4000cf91 in fixup () from /lib/ld-linux.so.2

(gdb) stepi

0x4000cf93 in fixup () from /lib/ld-linux.so.2

(gdb) stepi

0x4000cf94 in fixup () from /lib/ld-linux.so.2

(gdb)

As you can see, several other things happen before we even enter the printf function. The functions we see here, _dl_runtime_resolve and fixup, are found in /lib/ld-2.3.2.so, which is the dynamic linker library. The “??” marked functions are GDB’s way of saying that it couldn’t find the function name for the corresponding stack frame. This often happens with special cases, such as when the dynamic linker is involved.

正如您所看到的, 在我们进入 printf 函数之前, 还会发生其他一些事情。我们在这里看到的函数, _dl_runtime_resolve 和fixup, 发现在/lib/ld-2.3.2.so, 这是动态链接器库。"？？" 标记的函数是 GDB 的说法, 它无法找到相应栈帧的函数名。这通常发生在特殊情况下, 如涉及动态链接器时。

6.6.2. Settings for Execution Control Commands

6.6.2.1. Step-mode

The step-mode allows you to control how the step command works with functions that do not have any debug symbols. Let’s use an example program called dyn. It is compiled with debug symbols; however, it calls the standard C library function printf(), which by default has no debug symbols. Using step in the default manner, the following will occur:

通过step-mode, 您可以控制step命令与没有任何调试符号的函数一起工作。让我们使用一个名为 dyn 的示例程序。它是用调试符号编译的;但是, 它调用标准的 C 库函数 printf (), 默认情况下没有调试符号。以默认方式使用step, 将发生以下情况:

(gdb) break main

Breakpoint 1 at 0x804841c: file dyn_main.c, line 6.

(gdb) run

Starting program: /home/dbehman/book/code/dyn

Breakpoint 1, main () at dyn_main.c:6

6 void *dlhandle = NULL;

(gdb) next

9 printf( "Dynamically opening libdyn.so ...\n" );

(gdb) step

Dynamically opening libdyn.so ...

11 dlhandle = dlopen( "./libdyn.so", RTLD_NOW );

(gdb)

As you can see, the step command did not go into the printf function because it does not have debug symbols. To override this behavior, we can use the “step-mode” setting in GDB:

正如您所看到的, step命令没有进入 printf 函数, 因为它没有调试符号。要重写此行为, 我们可以使用 GDB 中的 "step-mode" 设置:

(gdb) break main

Breakpoint 1 at 0x804841c: file dyn_main.c, line 6.

(gdb) run

Starting program: /home/dbehman/book/code/dyn

Breakpoint 1, main () at dyn_main.c:6

6 void *dlhandle = NULL;

(gdb) show step-mode

Mode of the step operation is off.

(gdb) set step-mode on

(gdb) show step-mode

Mode of the step operation is on.

(gdb) next

9 printf( "Dynamically opening libdyn.so ...\n" );

(gdb) step

0x4007eab0 in printf () from /lib/i686/libc.so.6

(gdb)

As shown in this example, the step command did enter the printf function even though printf was not built with -g (that is, it has no debug information). This special step-mode can be useful if you want to step through the assembly language of a call function regardless of whether it was built in debug (that is, built with -g) or not.

如本示例所示, 尽管 printf 不是用 -g 生成的 (即没有调试信息), 但step命令确实进入了 printf 函数。如果要step遍历调用函数的汇编语言, 而不管它是否带调试符号 (即, 以 -g 生成), 则此特殊的step-mode很有用。

6.6.2.2. Following fork Calls

If the controlled program uses fork or vfork system calls, you can tell GDB that you want to handle the fork calls in a few different ways. By default, GDB will follow the parent process (the one you are controlling now) and will let the child process (the newly created process) run free. The term follow here means that GDB will continue to control a process, and the term run free means that GDB will detach from the process, letting it run unimpeded and unaffected by GDB. If you want GDB to follow the child process, use the command set follow-fork-mode child before the fork occurs. An alternative is to have GDB ask you what you want to do when a fork is encountered. This can be done with the command set follow-fork-mode ask. You can view the current setting with the command show follow-fork-mode.

如果受控程序使用fork或 vfork 系统调用, 您可以告诉 GDB 您想用几种不同的方式处理fork调用。默认情况下, GDB 将跟踪父进程 (现在正在控制的进程), 并允许子进程 (新创建的进程) 自由运行。这里的跟踪意味着 gdb 将继续控制父进程, 而这个自由运行意味着 GDB 将从这个进程中分离出来, 让它不受影响地运行, 并不会受到 GDB 的干扰。如果希望 GDB 跟踪子进程, 请在fork发生之前使用命令set follow-fork-mode child。另一种方法是让 GDB 问你在遇到fork时要做什么。这可以用命令set follow-fork-mode ask来完成。您可以使用命令show follow-fork-mode 查看当前设置。

Let’s assume that you are controlling a process that forks two children, and you want to eventually be in control of a “grandchild” process as shown in Figure 6.2.

假设您正在控制一个fork两个子进程的进程, 并且您希望最终控制 "孙" 进程, 如图6.2 所示。

Figure 6.2. Children and granchildren of a process.

You cannot get control of process 4 (as in the diagram) using the default GDB mode for following forks. In the default mode, GDB would simply keep control of process 1, and the other processes would run freely. You can’t use set follow-fork-mode child because GDB would follow the first forked process (process 2), and the parent process (process 1) would run freely. When process 1 forked off its second child process, it would no longer be under the control of GDB. The only way to follow the forks properly to get process 4 under the control of GDB is to use set follow-fork-mode ask. In this mode, you would tell GDB to follow the parent after the first fork call and to follow the child process for the next two forks.

使用以下fork, 默认 GDB 模式无法获得对进程 4 (如图所示) 的控制。在默认模式下, GDB 只会保持对进程1的控制, 而其他进程则可以自由运行。由于 GDB 将跟踪第一个fork过程 (流程 2), 并且父进程 (进程 1) 可以自由运行, 因此不能使用 set follow-fork-mode子进程。当进程1 fork第二个子进程时, 它将不再受 GDB 的控制。在 GDB 的控制下, 正确跟踪fork的唯一方法是使用 set follow-fork-mode ask。在这种模式下, 您会告诉 GDB 在第一个fork调用之后跟踪父进程, 并跟踪产生两个fork的子进程。

6.6.2.3. Handling Signals

Linux applications often receive various types of signals. Signals are a software convention similar to a very trivial message (the message being the signal number itself). Each signal has a different purpose, and GDB can be set to handle each in a different way. Consider the following simple program that sends itself a SIGALRM:

Linux 应用程序通常接收各种类型的信号。信号是一种类似于非常琐碎的信息的软件约定 (消息是信号本身)。每个信号都有不同的目的, 而 GDB 可以设置成不同的处理方式。请考虑下面的简单程序, 在这个程序中，发送自己 SIGALRM:

penguin> cat alarm.C

#include <stdio.h>

#include <unistd.h>

#include <sys/utsname.h>

int main()

{

alarm(3) ;

sleep(5) ;

return 0 ;

}

This simple program calls the alarm system call with an argument of 3 (for three seconds). This will cause a SIGALRM to be sent to the program in three seconds. The second call is to sleep with an argument of 5 and causes the program to sleep for five seconds, two seconds longer than it will take to receive the SIGALRM. Running this program in GDB has the following effect:

这个简单的程序调用alarm系统调用的参数为 3 (三秒)。这将导致在三秒内将 SIGALRM 发送到该程序。第二个调用是以5为参数的sleep, 使程序休眠五秒, 比接收 SIGALRM 要长两秒。在 GDB 中运行此程序具有以下效果:

(gdb) run

Starting program: /home/wilding/src/Linuxbook/alarm

Program terminated with signal SIGALRM, Alarm clock.

The program no longer exists.

The program was killed off because it did not have a handler installed for SIGALRM. Signal handling is beyond the scope of this chapter, but we can change how signals such as SIGALRM are handled through GDB. First, let’s take a look at how SIGALRM is handled in GDB by default:

程序被杀掉是因为它没有为 SIGALRM 安装处理程序。信号处理超出了本章的范围, 但我们可以通过 GDB 来改变诸如 SIGALRM 等信号的处理方式。首先, 让我们来看看 SIGALRM 是如何在 GDB 中处理的:

(gdb) info signals SIGALRM

Signal Stop Print Pass to program Description

SIGALRM No No Yes Alarm clock

The first column is the signal (in this case SIGALRM). The second column indicates whether GDB will stop the program and hand control over to the user of GDB when the signal is encountered. The third column indicates whether GDB will print a message to the screen when a signal is encountered. The fourth column indicates whether GDB will pass the signal to the controlled program (in the preceding example, the program is the alarm program that was run under the control of GDB). The last column is the description of the signal.

第一列是信号 (在本例中为 SIGALRM)。第二列表示 GDB 在遇到信号时是否会停止该程序并将其控制到 GDB 用户。第三列表示在遇到信号时, GDB 是否会将消息打印到屏幕上。第四列表示 GDB 是否将信号传递给受控程序 (在前面的示例中, 程序是在 GDB 控制下运行的警报程序)。最后一列是对信号的描述。

According to the output, GDB will not stop when the controlled program encounters a SIGALRM, it will not print a message, and it will pass the signal to the program. We can tell GDB to not pass the signal to the process by using handle SIGALRM nopass:

根据输出, 当受控程序遇到 SIGALRM 时, GDB 不会停止, 它不会打印消息, 并且会将信号传递给程序。通过使用处理器 SIGALRM nopass, 我们可以告诉 GDB 不传递信号到该过程:

(gdb) handle SIGALRM nopass

Signal Stop Print Pass to program Description

SIGALRM No No No Alarm clock

(gdb) run

Starting program: /home/wilding/src/Linuxbook/alarm

Program exited normally.

(gdb)

In the preceding example, the program slept for the full five seconds and did not receive the SIGALRM. Next, let’s tell GDB to stop when the controlled process receives SIGALRM:

在前面的示例中, 程序休眠了整整五秒, 没有收到 SIGALRM。接下来, 让我们告诉 GDB 在受控进程收到 SIGALRM 时停止:

(gdb) handle SIGALRM stop

Signal Stop Print Pass to program Description

SIGALRM Yes Yes Yes Alarm clock

(gdb) run

Starting program: /home/wilding/src/Linuxbook/alarm

Program received signal SIGALRM, Alarm clock.

0x4019fd01 in nanosleep () from /lib/libc.so.6

(gdb) bt

#0 0x4019fd01 in nanosleep () from /lib/libc.so.6

#1 0x4019fbd9 in sleep () from /lib/libc.so.6

#2 0x080483e3 in main ()

#3 0x4011a4a2 in __libc_start_main () from /lib/libc.so.6

In this case, GDB stopped when the controlled program received a signal and handed control back to the GDB user (that is, gave us a prompt). The command bt was run in the example here to display the stack trace and to show that the signal was received while in sleep (specifically in nanosleep) as expected.

在这种情况下, 当受控程序收到一个信号时，GDB 停止了。并把控制交给 GDB 用户 (即, 给我们一个提示)。在示例中运行了 bt 命令来显示堆栈跟踪, 并显示在睡眠中 (特别是在 nanosleep) 中接收到的信号。

If the process receives a lot of signals and you just want to keep track of when it receives a signal (and not take any actions), we can tell GDB to print a message every time the controlled process receives a signal:

如果这个进程收到了很多信号, 你只是想跟踪它收到信号 (而不采取任何行动), 我们可以告诉 GDB 在每次受控进程收到信号时都打印一条消息:

(gdb) run

The program being debugged has been started already.

Start it from the beginning? (y or n) y

Starting program: /home/wilding/src/Linuxbook/alarm

Program received signal SIGALRM, Alarm clock.

The program will continue to handle the signal as it was designed to, and GDB will simply report a message each time a SIGALRM is received. Sometimes this mode can be useful to see how often a process times out or when setting a breakpoint in a function that occurs after a signal is received by the process.

该程序将继续处理的信号, 正如程序所设计的。每次收到 SIGALRM，GDB 将只报告一个消息。有时, 此模式有助于查看进程超时的频率, 或在进程接收到信号后调用的函数中设置断点。

To see the full list of signals and how GDB is configured to handle them, use the info signals command:

要查看信号的完整列表以及GDB如何配置来处理它们, 请使用 "info signals" 命令:

(gdb) info signals

Signal Stop Print Pass to program Description

SIGHUP Yes Yes Yes Hangup

SIGINT Yes Yes No Interrupt

SIGQUIT Yes Yes Yes Quit

SIGILL Yes Yes Yes Illegal instruction

SIGTRAP Yes Yes No Trace/breakpoint trap

SIGABRT Yes Yes Yes Aborted

SIGEMT Yes Yes Yes Emulation trap

SIGFPE Yes Yes Yes Arithmetic exception

SIGKILL Yes Yes Yes Killed

SIGBUS Yes Yes Yes Bus error

SIGSEGV Yes Yes Yes Segmentation fault

SIGSYS Yes Yes Yes Bad system call

SIGPIPE Yes Yes Yes Broken pipe

SIGALRM No No Yes Alarm clock

...

Note: “...” in this output is not part of the output but is used to show that the output is longer than what is printed here.

注: "..." 在此输出中不是输出的一部分, 而是用于显示输出比此处打印的长。

You can tell GDB to handle each signal differently to match the desired functionality for the problem you are investigating.

您可以告诉 GDB 以不同的方式处理每个信号, 以匹配正在调查的问题。

6.6.3. Breakpoints

Breakpoints are a method to stop the execution of a program in a function at a particular point in the code or on a particular condition. We’ve been using breakpoints throughout this chapter, which goes to show how common they are. To see the current list of breakpoints set, use the “info breakpoints” command.

断点是在代码中或特定条件下, 在函数中停止执行某个程序的方法。我们在本章中使用了断点, 这说明了它们是多么的常见。要查看当前设置的断点列表, 请使用 "info breakpoints" 命令。

(gdb) break main

Breakpoint 1 at 0x80483c2

(gdb) info break

Num Type Disp Enb Address What

1 breakpoint keep y 0x080483c2 <main+6>

The breakpoint in the preceding example is set to stop the controlled program in the function main. This is the most common usage of breakpoints—that is, to stop in a particular function. The incantation for this is:

前面示例中的断点设置为在函数 main 中停止受控程序。这是断点的最常见用法, 即在特定函数中停止。这个方法是:

break <function name>

It can also be useful to set a breakpoint in a function only when one of the function parameters is a specific value. Say you had a function in your application that got called hundreds of times, but you’re only interested in examining this function when one of the parameters is a specific value. Say the function is called common_func and takes one integer parameter called num. To set up a conditional breakpoint on this function when num equals 345 for example, you would first set the breakpoint:

仅当函数参数之一是特定值时, 才在函数中设置断点也很有用。假设您的应用程序中有一个调用了成百上千次的函数, 但只有当其中一个参数是特定值时, 才会对该函数进行检查。假设函数名为 common_func, 并使用一个称为 num 的整数参数。设置此函数的条件断点, 在 num 等于345时停止。请首先设置断点:

(gdb) break common_func

Breakpoint 2 at 0x8048312: file common.c, line 3.

Now that the breakpoint is set, use the condition command to define a condition for this newly set breakpoint. We reference the breakpoint by number, in this case, 2.

现在已设置断点, 请使用condition命令为新设置的断点定义条件。我们按数字引用断点, 在本例中为2。

(gdb) condition 2 num == 345

Notice the double equal signs—this is the same notation you would use for an expression in C programming.

注意两个等号-这是在 C 程序中用于表示相等的符号。

Verify the correct setting of the breakpoint with the info breakpoint command:

使用 " info breakpoint " 命令验证断点的设置:

(gdb) info breakpoint

Num Type Disp Enb Address What

1 breakpoint keep y 0x0804832a in main at common.c:10

2 breakpoint keep y 0x08048312 in common_func at common.c:3

stop only if num == 345

When continuing program execution, breakpoint number 2 will only be triggered when the value of num is 345:

在继续执行程序时, 只有当 num 值为345时才会触发断点号 2:

(gdb) cont

Continuing.

Breakpoint 2, common_func (num=345) at common.c:3

3 int foo = num;

If the program was compiled with -g, you can set a breakpoint on a particular line of code as in the next example:

如果程序是使用-g 编译的, 则可以在特定代码行上设置断点, 如下所示:

(gdb) break hang2.C:20

Breakpoint 1 at 0x8048467: file hang2.C, line 20.

(gdb) run user

Starting program: /home/wilding/src/Linuxbook/hang2 user

Breakpoint 1, main (argc=2, argv=0xbfffefc4) at hang2.C:20

20 if ( !strcmp( argv[1], "user" ) )

(gdb)

If you don’t have the source code, you can also still set a breakpoint at a specific address as follows:

如果没有源代码, 还可以在特定地址设置断点, 如下所示:

(gdb) break * 0x804843c

Breakpoint 2 at 0x804843c

(gdb) cont

Continuing.

Breakpoint 2, 0x0804843c in main ()

(gdb)

Notice the * in front of the address; this is used when the argument to the break command is an address.

注意地址前面的 *;当break命令的参数是地址时, 将使用此方法。

To delete a breakpoint, use the delete command with the breakpoint number as the argument.

若要删除断点, 请使用带有breakpoint号的 delete 命令作为参数。

6.6.4. Watchpoints

Watchpoints, as the name implies, are for watching data in your program, especially for alerting you when data at a specific address changes value. If you have a variable in your program that is getting changed for some bizarre and unknown reason, this could be a symptom of memory corruption. Memory corruption problems can be extremely difficult to track down given that it can happen long before the symptom (for example, trap or unexpected behavior). With a watchpoint, you can tell GDB to watch the specified variable and let you know immediately when it changes. For memory corruption, GDB will tell you exactly where and when the corruption occurs so you can easily fix the problem.

顾名思义, 观察点是用于在程序中监视数据, 特别是当特定地址的数据更改时提醒您。如果程序中的某个变量因某种奇怪和未知的原因而改变, 这可能是内存损坏的症状。由于在出现症状之前可能会发生很长时间 (例如, 陷阱或意外行为), 内存损坏问题可能非常困难。使用 watchpoint, 您可以告诉 GDB 监视指定的变量, 并在更改时立即通知您。对于内存损坏, GDB 将准确地告诉您发生损坏的地点和时间, 以便您可以轻松地解决问题。

There are two kinds of watchpoints—hardware and software. The x86 hardware, for example, provides built-in support specifically for watchpoints, and GDB will make use of this support. If the support does not exist or if the conditions for the use of the hardware are not met, GDB will revert to using software watchpoints. A software watchpoint is much slower than a hardware watchpoint. The reason for this is because GDB must stop the program execution after each assembly instruction and examine every watchpoint for changes. Conversely, hardware watchpoints allow GDB to run normally but will instantly notify GDB of a change when/if it occurs.

有两种观察点:硬件和软件。例如, x86 硬件为观察点提供了内置的支持, GDB 将利用此支持。如果不支持或硬件使用条件不满足, GDB 将恢复使用软件观察点。软件 watchpoint 比硬件 watchpoint 慢得多。原因是, GDB 必须在每个汇编指令之后停止程序执行, 并检查每个 watchpoint 的更改。相反, 硬件观察点允许 gdb 正常运行, 但会立即通知 gdb 的观察到值的变化，如果它发生。

To demonstrate watchpoints, let’s use a simple program that simulates an employee record system. The source code is:

为了演示观察点, 让我们使用一个模拟员工记录系统的简单程序。源代码是:

Code View: Scroll / Show All

#include <stdio.h>

struct employee

{

char name[8];

int serial_num;

};

void print_employee_rec( struct employee rec )

{

printf( "Name: %s\n", rec.name );

printf( "Number: %d\n", rec.serial_num );

return;

}

void update_employee_name( struct employee *rec, char *name )

{

strcpy( rec->name, name );

return;

}

void add_employee( struct employee *rec, char *name, int num )

{

strcpy( rec->name, name );

rec->serial_num = num;

return;

}

int main( void )

{

struct employee rec;

add_employee( &rec, "Fred", 25 );

print_employee_rec( rec );

printf( "\nUpdating employee's name ...\n\n" );

update_employee_name( &rec, "Fred Smith" );

print_employee_rec( rec );

return 0;

}

The basic flow of the program is to create an employee record with the name “Fred” and serial number 25. Next, the program updates the employee’s name to “Fred Smith” but does not touch the serial number. Running the program produces this output:

程序的基本流程是创建一个名为 "弗雷德" ，序列号25的雇员记录。接下来, 该程序将雇员的姓名更新为 "弗雷德 Smith", 但不触及序列号。运行该程序将生成此输出:

penguin> employee

Name: Fred

Number: 25

Updating employee's name ...

Name: Fred Smith

Number: 26740

If the program isn’t supposed to update the serial number when the name is changed, then why did it change to 26740? This kind of error is indicative of memory corruption. If you’ve examined the source code, you might already know what the problem is, but let’s use GDB and watchpoints to tell us what the problem is. We know that something bad happens after printing out the employee record the first time, so let’s set a watchpoint on the serial_num member of the structure at that point:

如果程序不应该在更改名称时更新序列号, 那么为什么它会更改为26740？此类错误表明内存损坏。如果您已经检查了源代码, 您可能已经知道问题是什么, 但让我们使用 GDB 和watchpoint来告诉我们问题是什么。我们知道, 第一次打印出员工记录后会发生一些不好的事情, 因此, 让我们在该点的结构 serial_num 成员上设置一个 watchpoint:

(gdb) break main

Breakpoint 1 at 0x80483e6: file employee.c, line 36.

(gdb) run

Starting program: /home/dbehman/book/code/employee

Breakpoint 1, main () at employee.c:36

36 add_employee( &rec, "Fred", 25 );

(gdb) next

38 print_employee_rec( rec );

(gdb) next

Name: Fred

Number: 25

40 printf( "\nUpdating employee's name ...\n\n" );

(gdb) watch rec.serial_num

Hardware watchpoint 2: rec.serial_num

It is important to note that GDB was able to successfully engage the assistance of the hardware for this watchpoint. GDB indicates this with the message, “Hardware watchpoint 2...”. If the keyword “Hardware” does not appear, then GDB was unable to use the hardware and defaulted to using a software watchpoint (which is much, much slower). Let’s now continue our program execution and see what happens:

值得注意的是, GDB 能够成功地为这个 watchpoint 的硬件提供帮助。消息"硬件 watchpoint 2..."表示硬件支持GDB watchpoint。如果关键字 "硬件" 没有出现, 那么 GDB 就无法使用该硬件功能, 并且默认使用软件 watchpoint (这要慢得多)。现在让我们继续执行我们的程序, 看看会发生什么:

(gdb) cont

Continuing.

Updating employee's name ...

Hardware watchpoint 2: rec.serial_num

Old value = 25

New value = 116

0x400a3af9 in strcpy () from /lib/i686/libc.so.6

(gdb) backtrace

#0 0x400a3af9 in strcpy () from /lib/i686/libc.so.6

#1 0x080483af in update_employee_name (rec=0xbffff390, name=0x80485c0

"Fred Smith")

at employee.c:19

#2 0x08048431 in main () at employee.c:42

Bingo! We can see that this program has the infamous buffer overrun bug. The strcpy function does not do any bounds checking or limiting and happily writes past our allotted buffer of eight bytes, which corrupts the next piece of memory occupied by the serial_num structure member.

Bingo！我们可以看到这个程序有臭名昭著的缓冲区溢出 bug。strcpy 函数不做任何边界检查或限制, 并愉快地越过了我们分配的缓冲区八字节的界限, 这弄坏了下一块内存的 serial_num 结构体的成员。

If you have a reproducible problem and you can find the address that gets corrupted, a watchpoint can reduce the investigating time from days (of setting breakpoints or using print statements) to minutes.

如果您有一个可重现的问题, 并且您可以找到损坏的地址, watchpoint 的调查时间可以从天减少 (设置断点或使用打印语句) 到分钟。

Well, this is great, but what if you don’t have the source code, and/or the program was not built with -g? You can still set hardware watchpoints, but you need to set them directly on an address as in the following example.

嗯, 这是伟大的, 但如果你没有源代码, 和/或程序不是用 -g 构建的？您仍然可以设置硬件观察点, 但您需要直接在地址上设置它们, 如下面的示例所示。

The program is simple and changes the value of a global symbol. We’re using a global symbol because we can easily find the address of that regardless of whether or not the program is built with -g.

该程序很简单, 并更改了全局变量的值。我们使用的是一个全局变量, 因为无论程序是否用 -g 构建的, 我们都可以很容易地找到它的地址。

int a = 5 ;

int main()

{

a = 6 ;

return 0 ;

}

Now, inside of GDB we can find the address of the variable a and set a watchpoint on that address:

现在, 在 GDB 的内部, 我们可以找到变量 a 的地址, 并在该地址设置一个 watchpoint:

(gdb) print &a

$1 = (<data variable, no debug info> *) 0x80493e0

(gdb) watch (int) *0x80493e0

Hardware watchpoint 1: (int) *134517728

The notation here told GDB to watch the contents of the address 0x80493e0 for any changes and to treat the address as an integer. Be sure to dereference the address with a “*,” or GDB will not set the watchpoint correctly. We can now run the program and see the hardware watchpoint in action:

这里的表示法告诉 GDB 注意地址0x80493e0 的内容的任何更改, 并将地址作为整数处理。一定要用 "*" 取消引用地址, 否则 GDB 将不会正确设置 watchpoint。现在, 我们可以运行该程序, 并看到在操作中的硬件 watchpoint:

(gdb) run

Starting program: /home/wilding/src/Linuxbook/watch

Hardware watchpoint 1: (int) *134517728

Old value = 5

New value = 6

0x08048376 in main ()

(gdb)

The watchpoint was triggered several times, but in each case the value of the address was not changed, so GDB did not stop the process. Only in the last occurrence did GDB stop the process because the value changed from 5 to 6.

watchpoint 被触发多次, 但在地址的值没有改变, 所以 GDB 没有停止进程。只有在最后一个事件中, GDB 才停止进程, 因为该地址的值从5更改为6。

There are three different types of watchpoints:

有三种不同类型的观察点:

	watch - Cause a break in execution for any write to an address
	rwatch - Cause a break in execution for any read to an address
	awatch - Cause a break in execution for a read or write to an address

Besides the different memory access attributes, the three types of watchpoints can be used in the same way. There are some situations where a read watchpoint can be useful. One example is called a “late read.” A late read is a situation where a code path reads memory after it has been freed. If you know which block of memory is referenced after it has been freed, a read watchpoint can catch the culprit code path that references the memory.

除了不同的内存访问属性, 三种类型的观察点可以使用相同的方式。有些情况下, 读 watchpoint 是有用的。一个例子叫做 "延迟读取"。延迟读取是代码路径在释放后读取内存的情况。如果知道哪个内存块在释放后又被访问了, 则读取 watchpoint 可以捕获访问内存的罪魁祸首代码路径。

Note: To delete a watchpoint, use the delete command with the watchpoint number as the argument.

注意: 要删除 watchpoint, 请使用带有 watchpoint 号的 delete 命令作为参数。

6.6.5. Display Expression on Stop

Throughout a debugging session, you will find that you will be checking the value of certain variables again and again. GDB provides a handy feature called displays. Displays allow you to tell GDB to display whatever expression you’ve set as a display after each execution stop. To set a display, use the display command. Here is an example:

在整个调试会话中, 您将发现您将反复检查某些变量的值。GDB 提供了一个方便的功能称为显示。显示允许您告诉 GDB 在每次执行停止后显示您设置为显示的任何表达式。要设置显示, 请使用 "display" 命令。下面是一个示例:

(gdb) display a

(gdb) break main

Breakpoint 1 at 0x8048362

(gdb) run

Starting program: /home/wilding/src/Linuxbook/watch

Breakpoint 1, 0x08048362 in main ()

1: {<data variable, no debug info>} 134517792 = 5

The last line is the display line, and the display item has a number of 1. To delete this display, we use the delete display command:

最后一行是显示行, 显示项的数量为1。要删除此显示, 我们使用 "delete display" 命令:

(gdb) display

1: {<data variable, no debug info>} 134517792 = 5

(gdb) delete display 1

To enable or disable a preset display, use the enable display and disable display commands.

要启用或禁用预设显示, 请使用 "enable display" 和 "disable display" 命令。

Note: GDB’s GUI brother, DDD (Data Display Debugger), is perfectly suited for using the concepts of displays. Please refer to the section on DDD for more information on displays.

注意: GDB 的 GUI 兄弟, DDD (数据显示调试器), 非常适合使用显示的概念。有关显示的更多信息, 请参阅 DDD 部分。

6.6.6. Working with Shared Libraries

GDB has a command that will show the shared libraries that a program links in and to see where those libraries have been mapped into the process’ address space. If you get an instruction address, you can use this information to find out which library the instruction is in (and eventually the line of code if you wish). It is also useful to confirm that the program is loading the correct libraries.

GDB 有一个命令, 它将显示一个程序链接到的共享库, 并查看这些库映射到进程的地址空间。如果您得到一个指令地址, 则可以使用此信息来查找指令所在的库 (如果愿意的话, 最终是代码行)。确认程序正在加载正确的库也很有用。

Use the info sharedlibrary command to see this information:

使用 "信息 sharedlibrary" 命令可以查看此信息:

(gdb) info sharedlibrary

From To Syms Read Shared Object Library

0x40040b40 0x4013b7b4 Yes /lib/i686/libc.so.6

0x40000c00 0x400139ef Yes /lib/ld-linux.so.2

(gdb)

Shared libraries are like common program extensions. They contain executable code and variables just like an executable, though the libraries can be shared by multiple executables at the same time.

共享库类似于常见的程序扩展。它们包含可执行代码和变量, 就像一个可执行文件一样, 尽管库可以同时由多个可执行文件共享。

6.6.6.1. Debugging Functions in Shared Libraries

GDB normally does a great job of handling shared libraries that an executable links in. For example, GDB will happily set a breakpoint in a function that exists in a shared library, just as in an executable. There are times, however, when shared libraries get dynamically loaded in an application, which makes it almost impossible for GDB to know what functions could be run before the library is loaded. To illustrate this problem, consider the following two source files:

GDB 通常在处理可执行文件链接的共享库方面做得非常出色。例如, GDB 将愉快地在共享库中存在的函数中设置断点, 就像在可执行文件中一样。但是, 有时在应用程序中动态加载共享库时, GDB 几乎不可能知道在加载库之前可以运行哪些函数。要说明此问题, 请考虑以下两个源文件:

Code View: Scroll / Show All

dyn.c:

#include <stdio.h>

void func2( void )

{

printf( "This function is not referenced in dyn_main.c\n" );

return;

}

void func1( void )

{

printf( "This function is in libdyn.so\n" );

func2();

return;

}

dyn_main.c:

#include <stdio.h>

#include <dlfcn.h>

int main( void )

{

void *dlhandle = NULL;

void (*func1_ref)( void );

printf( "Dynamically opening libdyn.so ...\n" );

dlhandle = dlopen( "./libdyn.so", RTLD_NOW );

func1_ref = dlsym( dlhandle, "func1" );

func1_ref();

exit:

return 0;

}

Now compile these modules with the following commands:

现在, 使用以下命令编译这些模块:

penguin> gcc -shared -o libdyn.so dyn.c -g

penguin> gcc -o dyn dyn_main.c -g -ldl

Now, in a debugging session let’s say we only wanted to set a breakpoint in func2(). Attempting this after starting GDB with GDB dyn produces this error:

现在, 在调试中, 假设我们只想在 func2 () 中设置断点。在使用 gdb dyn 启动 gdb 后尝试此操作会产生错误:

(gdb) break func2

Function "func2" not defined.

Using the command that follows, we list the shared libraries that are associated with this executable:

使用下面的命令, 我们列出与此可执行文件关联的共享库:

(gdb) info sharedlibrary

No shared libraries loaded at this time.

(gdb) break main

Note: breakpoint 1 also set at pc 0x804841c.

Breakpoint 2 at 0x804841c: file dyn_main.c, line 6.

(gdb) run

Starting program: /home/dbehman/book/code/dyn

Breakpoint 1, main () at dyn_main.c:6

6 void *dlhandle = NULL;

(gdb) info sharedlibrary

From To Syms Read Shared Object Library

0x4002beb0 0x4002cde4 Yes /lib/libdl.so.2

0x40043b40 0x4013e7b4 Yes /lib/i686/libc.so.6

0x40000c00 0x400139ef Yes /lib/ld-linux.so.2

(gdb) break func2

Function "func2" not defined.

(gdb)

In the first part of the output, we see that no shared libraries are loaded. This is because the program has not actually started. To get the program running, we set a breakpoint in the main function and run the program using the GDB command, run. When the program is running, we can then see the information about the shared libraries, and as you can see, libdyn.so is not listed. This is why the break func2 attempt failed once again.

在输出的第一部分, 我们看到没有加载共享库。这是因为程序实际上没有启动。为了使程序运行, 我们在main函数中设置断点, 然后使用 GDB 命令运行程序。当程序运行时, 我们可以看到有关共享库的信息, 如您所见, libdyn.so没有列出。这就是break func2 尝试再次失败的原因。

From the preceding source code, we know that libdyn.so will be dynamically loaded as the program runs (using dlopen). This is important because to set a breakpoint in a library that has not been loaded, we need to tell GDB to stop execution when the controlled program loads a new shared library. We can tell GDB to do this with the command set stop-on-solib-events 1. The current state of this flag can be shown with the show stop-on-solib-events command:

从前面的源代码, 我们知道 libdyn.so将在程序运行时动态加载 (使用 dlopen)。这一点很重要, 因为要在尚未加载的库中设置断点, 我们需要告诉 GDB 在受控程序加载新的共享库时停止执行。我们可以用命令set stop-on-solib-events 1告诉 GDB 这样做。此标志的当前状态可以与 " show stop-on-solib-events " 命令一起显示:

(gdb) show stop-on-solib-events

Stopping for shared library events is 0.

(gdb) set stop-on-solib-events 1

(gdb) show stop-on-solib-events

Stopping for shared library events is 1.

(gdb)

Now let’s tell GDB to let the program continue:

现在让我们告诉 GDB 让程序继续:

(gdb) cont

Continuing.

Dynamically opening libdyn.so ...

Stopped due to shared library event

(gdb) backtrace

#0 0x4000dd60 in _dl_debug_state_internal () from /lib/ld-linux.so.2

#1 0x4000d7fa in _dl_init_internal () from /lib/ld-linux.so.2

#2 0x4013a558 in dl_open_worker () from /lib/i686/libc.so.6

#3 0x4000d5b6 in _dl_catch_error_internal () from /lib/ld-linux.so.2

#4 0x4013a8ff in _dl_open () from /lib/i686/libc.so.6

#5 0x4002bfdb in dlopen_doit () from /lib/libdl.so.2

#6 0x4000d5b6 in _dl_catch_error_internal () from /lib/ld-linux.so.2

#7 0x4002c48a in _dlerror_run () from /lib/libdl.so.2

#8 0x4002c022 in dlopen@@GLIBC_2.1 () from /lib/libdl.so.2

#9 0x08048442 in main () at dyn_main.c:11

(gdb) break func2

Breakpoint 3 at 0x4001b73e: file dyn.c, line 5.

(gdb)

As you can see by the stack trace output, GDB stopped deep inside the dlopen() system call. By that point in time, the symbol table was loaded, and we were able to set a breakpoint in the desired function. We can now choose to continue by issuing the cont command, but we will encounter more stops due to shared library events. Because we’ve accomplished our goal of being able to set a breakpoint in func2(), let’s turn off the stop-on-solib-event flag and then continue:

正如您可以看到的栈跟踪输出, GDB在 dlopen () 系统调用时停止。此时, 符号表被加载, 我们能够在所需函数中设置断点。现在, 我们可以选择继续运行 "continue" 命令, 但是由于共享库, 我们会遇到更多的停止。因为我们已经完成了能够在 func2 () 中设置断点的目标, 所以让我们stop-on-solib-event标志, 然后继续:

(gdb) set stop-on-solib-events 0

(gdb) cont

Continuing.

This function is in libdyn.so

Breakpoint 3, func2 () at dyn.c:5

5 printf( "This function is not referenced in dyn_main.c\n" );

(gdb)

Mission accomplished!

6.6. Execution

如何基于surging跨网关跨语言进行缓存降级

2024合集

程序员天天 CURD，怎么才能成长，职业发展的思考(2)

移位操作搞定两数之商

教你用Perl实现Smgp协议

如何通过前端表格控件在10分钟内完成一张分组报表？

win11关闭自动检测病毒删文件

通用代码生成器简介

lightdb 单机模式下数据库平移

千兆宽带实际网速能到达多少？

Change default rpm build root directory

make qemu work with cx5 passthrough in centos

send 32768 different MAC addresses to the server

configure switchdev mode to run hardware offload test

Backport egdev mechanism to redhat kernel 3.10.0-1127

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結