ptrace運行原理及使用詳解

你想過怎麼實現對系統調用的攔截嗎？你嘗試過通過改變系統調用的參數來愚弄你的系統kernel嗎？你想過調試器是如何使運行中的進程暫停並且控制它嗎？

你可能會開始考慮怎麼使用複雜的kernel編程來達到目的，那麼，你錯了。實際上Linux提供了一種優雅的機制來完成這些：ptrace系統函數。 ptrace提供了一種使父進程得以監視和控制其它進程的方式，它還能夠改變子進程中的寄存器和內核映像，因而可以實現斷點調試和系統調用的跟蹤。

使用ptrace，你可以在用戶層攔截和修改系統調用(sys call)
在這篇文章中，我們將學習如何攔截一個系統調用，然後修改它的參數。在本文的第二部分我們將學習更先進的技術：設置斷點，插入代碼到一個正在運行的程序中；我們將潛入到機器內部，偷窺和纂改進程的寄存器和數據段。

基本知識

操作系統提供了一種標準的服務來讓程序員實現對底層硬件和服務的控制（比如文件系統），叫做系統調用(system calls)。當一個程序需要作系統調用的時候，它將相關參數放進系統調用相關的寄存器，然後調用軟中斷0x80，這個中斷就像一個讓程序得以接觸到內核模式的窗口，程序將參數和系統調用號交給內核，內核來完成系統調用的執行。
在i386體系中(本文中所有的代碼都是面向i386體系)，系統調用號將放入%eax,它的參數則依次放入%ebx, %ecx, %edx, %esi 和 %edi。比如，在以下的調用。
Write(2, “Hello”, 5)
的彙編形式大概是這樣的
movl $4, %eax
movl $2, %ebx
movl $hello, %ecx
movl $5, %edx
int $0x80
這裏的$hello指向的是標準字符串”Hello”。

那麼，ptrace會在什麼時候出現呢？在執行系統調用之前，內核會先檢查當前進程是否處於被“跟蹤”(traced)的狀態。如果是的話，內核暫停當前進程並將控制權交給跟蹤進程，使跟蹤進程得以察看或者修改被跟蹤進程的寄存器。

讓我們來看一個例子，演示這個跟蹤程序的過程：

#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <linux/user.h>   /* For constants 
                                   ORIG_EAX etc */
int main()
{
   pid_t child;
    long orig_eax;
    child = fork();
    if(child == 0) {
        ptrace(PTRACE_TRACEME, 0, NULL, NULL);
        execl("/bin/ls", "ls", NULL);
    }
    else {
        wait(NULL);
        orig_eax = ptrace(PTRACE_PEEKUSER, 
                          child, 4 * ORIG_EAX, 
                          NULL);
        printf("The child made a "
               "system call %ld ", orig_eax);
        ptrace(PTRACE_CONT, child, NULL, NULL);
    }
    return 0;
}

運行這個程序，將會在輸出ls命令的結果的同時，輸出:
The child made a system call 11
說明：11是execve的系統調用號，這是該程序調用的第一個系統調用。
想知道系統調用號的詳細內容，察看 /usr/include/asm/unistd.h。

在以上的示例中，父進程fork出了一個子進程，然後跟蹤它。在調用exec函數之前，子進程用PTRACE_TRACEME作爲第一個參數調用了ptrace函數，它告訴內核：讓別人跟蹤我吧！然後，在子進程調用了execve()之後，它將控制權交還給父進程。當時父進程正使用wait()函數來等待來自內核的通知，現在它得到了通知，於是它可以開始察看子進程都作了些什麼，比如看看寄存器的值之類。

ptrace函數的參數

Ptrace有四個參數

long ptrace(enum __ptrace_request request,
pid_t pid,
void *addr,
void *data);

第一個參數決定了ptrace的行爲與其它參數的使用方法，可取的值有:

出現系統調用之後，內核會將eax中的值（此時存的是系統調用號）保存起來，我們可以使用PTRACE_PEEKUSER作爲ptrace的第一個參數來讀到這個值。
我們察看完系統調用的信息後，可以使用PTRACE_CONT作爲ptrace的第一個參數，調用ptrace使子進程繼續系統調用的過程。

第一個參數決定了ptrace的行爲與其它參數的使用方法，可取的值有:
PTRACE_ME
PTRACE_PEEKTEXT
PTRACE_PEEKDATA
PTRACE_PEEKUSER
PTRACE_POKETEXT
PTRACE_POKEDATA
PTRACE_POKEUSER
PTRACE_GETREGS
PTRACE_GETFPREGS,
PTRACE_SETREGS
PTRACE_SETFPREGS
PTRACE_CONT
PTRACE_SYSCALL,
PTRACE_SINGLESTEP
PTRACE_DETACH

在下文中將對這些常量的用法進行說明。

讀取系統調用的參數
通過將PTRACE_PEEKUSER作爲ptrace 的第一個參數進行調用，可以取得與子進程相關的寄存器值。

先看下面這個例子

#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <linux/user.h>
#include <sys/syscall.h>   /* For SYS_write etc */

int main()
{   
    pid_t child;
    long orig_eax, eax;
    long params[3];
    int status;
    int insyscall = 0;
    child = fork();
    if(child == 0) {
        ptrace(PTRACE_TRACEME, 0, NULL, NULL);
        execl("/bin/ls", "ls", NULL);
    }
    else {
       while(1) {
          wait(&status);
          if(WIFEXITED(status))
              break;
          orig_eax = ptrace(PTRACE_PEEKUSER, 
                     child, 4 * ORIG_EAX, NULL);
          if(orig_eax == SYS_write) {
             if(insyscall == 0) {    
                /* Syscall entry */
                insyscall = 1;
                params[0] = ptrace(PTRACE_PEEKUSER,
                                   child, 4 * EBX, 
                                   NULL);
                params[1] = ptrace(PTRACE_PEEKUSER,
                                   child, 4 * ECX, 
                                   NULL);
                params[2] = ptrace(PTRACE_PEEKUSER,
                                   child, 4 * EDX, 
                                   NULL);
                printf("Write called with "
                       "%ld, %ld, %ld ",
                       params[0], params[1],
                       params[2]);
                }
          else { /* Syscall exit */
                eax = ptrace(PTRACE_PEEKUSER, 
                             child, 4 * EAX, NULL);
                    printf("Write returned "
                           "with %ld ", eax);
                    insyscall = 0;
                }
            }
            ptrace(PTRACE_SYSCALL, 
                   child, NULL, NULL);
        }
    }
    return 0;
}

這個程序的輸出是這樣的

ppadala@linux:~/ptrace > ls
a.out dummy.s ptrace.txt
libgpm.html registers.c syscallparams.c
dummy ptrace.html simple.c
ppadala@linux:~/ptrace > ./a.out
Write called with 1, 1075154944, 48
a.out dummy.s ptrace.txt
Write returned with 48
Write called with 1, 1075154944, 59
libgpm.html registers.c syscallparams.c
Write returned with 59
Write called with 1, 1075154944, 30
dummy ptrace.html simple.c
Write returned with 30

以上的例子中我們跟蹤了write系統調用，而ls命令的執行將產生三個write系統調用。使用PTRACE_SYSCALL作爲ptrace的第一個參數，使內核在子進程做出系統調用或者準備退出的時候暫停它。這種行爲與使用PTRACE_CONT，然後在下一個系統調用/進程退出時暫停它是等價的。

在前一個例子中，我們用PTRACE_PEEKUSER來察看write系統調用的參數。系統調用的返回值會被放入%eax。

wait函數使用status變量來檢查子進程是否已退出。它是用來判斷子進程是被ptrace暫停掉還是已經運行結束並退出。有一組宏可以通過status的值來判斷進程的狀態，比如WIFEXITED等，詳情可以察看wait(2) man。

讀取寄存器的值

如果你想在系統調用或者進程終止的時候讀取它的寄存器，使用前面那個例子的方法是可以的，但是這是笨拙的方法。使用PRACE_GETREGS作爲ptrace的第一個參數來調用，可以只需一次函數調用就取得所有的相關寄存器值。

獲得寄存器值得例子如下：

#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <linux/user.h>
#include <sys/syscall.h>

int main()
{   
    pid_t child;
    long orig_eax, eax;
    long params[3];
    int status;
    int insyscall = 0;
    struct user_regs_struct regs;
    child = fork();
    if(child == 0) {
        ptrace(PTRACE_TRACEME, 0, NULL, NULL);
        execl("/bin/ls", "ls", NULL);
    }
    else {
       while(1) {
          wait(&status);
          if(WIFEXITED(status))
              break;
          orig_eax = ptrace(PTRACE_PEEKUSER, 
                            child, 4 * ORIG_EAX, 
                            NULL);
          if(orig_eax == SYS_write) {
              if(insyscall == 0) {
                 /* Syscall entry */
                 insyscall = 1;
                 ptrace(PTRACE_GETREGS, child, 
                        NULL, ®s);
                 printf("Write called with "
                        "%ld, %ld, %ld ",
                        regs.ebx, regs.ecx, 
                        regs.edx);
             }
             else { /* Syscall exit */
                 eax = ptrace(PTRACE_PEEKUSER, 
                              child, 4 * EAX, 
                              NULL);
                 printf("Write returned "
                        "with %ld ", eax);
                 insyscall = 0;
             }
          }
          ptrace(PTRACE_SYSCALL, child,
                 NULL, NULL);
       }
   }
   return 0;
}

這段代碼與前面的例子是比較相似的，不同的是它使用了PTRACE_GETREGS。其中的user_regs_struct結構是在<linux/user.h>中定義的。

單步
ptrace提供了對子進程進行單步的功能。 ptrace(PTRACE_SINGLESTEP, …) 會使內核在子進程的每一條指令執行前先將其阻塞，然後將控制權交給父進程。下面的例子可以查出子進程當前將要執行的指令。爲了便於理解，我用匯編寫了這個受控程序，而不是讓你爲c的庫函數到底會作那些系統調用而頭痛。

以下是被控程序的代碼 dummy1.s，使用gcc –o dummy1 dummy1.s來編譯
.data
hello:
.string "hello world/n"
.globl main
main:
movl $4, %eax
movl $2, %ebx
movl $hello, %ecx
movl $12, %edx
int $0x80
movl $1, %eax
xorl %ebx, %ebx
int $0x80
ret

以下的程序則用來完成單步：

#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <linux/user.h> 
#include <sys/syscall.h>
int main()
{
    pid_t child;
    const int long_size = sizeof(long);
    child = fork();
    if(child == 0) {
        ptrace(PTRACE_TRACEME, 0, NULL, NULL);
        execl("./dummy1", "dummy1", NULL);
    }
    else {
        int status;
        union u {
            long val;
            char chars[long_size];
        }data;
        struct user_regs_struct regs;
        int start = 0;
        long ins;
        while(1) {
            wait(&status);
            if(WIFEXITED(status))
                break;
            ptrace(PTRACE_GETREGS, 
                   child, NULL, ®s);
            if(start == 1) {
                ins = ptrace(PTRACE_PEEKTEXT, 
                             child, regs.eip, 
                             NULL);
                printf("EIP: %lx Instruction "
                       "executed: %lx ", 
                       regs.eip, ins);
            }
            if(regs.orig_eax == SYS_write) {
                start = 1;
                ptrace(PTRACE_SINGLESTEP, child, 
                       NULL, NULL);
            }
            else
                ptrace(PTRACE_SYSCALL, child, 
                       NULL, NULL);
        }
    }
    return 0;
}

程序的輸出是這樣的：

你可能需要察看Intel的用戶手冊來了解這些指令代碼的意思。

更復雜的單步，比如設置斷點，則需要很仔細的設計和更復雜的代碼纔可以實現。

在第一部分中我們已經看到ptrace怎麼獲取子進程的系統調用以及改變系統調用的參數。在這篇文章中，我們將要研究如何在子進程中設置斷點和往運行中的程序裏插入代碼。實際上調試器就是用這種方法來設置斷點和執行調試句柄。與前面一樣，這裏的所有代碼都是針對i386平臺的。

附着在進程上

在第一部分鐘，我們使用ptrace(PTRACE_TRACEME, …)來跟蹤一個子進程，如果你只是想要看進程是怎麼進行系統調用和跟蹤程序的，這個做法是不錯的。但如果你要對運行中的進程進行調試，則需要使用 ptrace( PTRACE_ATTACH, ….)

當 ptrace( PTRACE_ATTACH, …)在被調用的時候傳入了子進程的pid時，它大體是與ptrace( PTRACE_TRACEME, …)的行爲相同的，它會向子進程發送SIGSTOP信號，於是我們可以察看和修改子進程，然後使用 ptrace( PTRACE_DETACH, …)來使子進程繼續運行下去。

下面是調試程序的一個簡單例子。

int main()
{   
   int i;
    for(i = 0;i < 10; ++i) {
        printf("My counter: %d ", i);
        sleep(2);
    }
    return 0;
}

將上面的代碼保存爲dummy2.c。按下面的方法編譯運行：

gcc -o dummy2 dummy2.c
./dummy2 &

現在我們可以用下面的代碼來附着到dummy2上。

#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <linux/user.h>   /* For user_regs_struct 
                             etc. */
int main(int argc, char *argv[])
{   
    pid_t traced_process;
    struct user_regs_struct regs;
    long ins;
    if(argc != 2) {
        printf("Usage: %s <pid to be traced> ",
               argv[0], argv[1]);
        exit(1);
    }
    traced_process = atoi(argv[1]);
    ptrace(PTRACE_ATTACH, traced_process, 
           NULL, NULL);
    wait(NULL);
    ptrace(PTRACE_GETREGS, traced_process, 
           NULL, ®s);
    ins = ptrace(PTRACE_PEEKTEXT, traced_process, 
                 regs.eip, NULL);
    printf("EIP: %lx Instruction executed: %lx ", 
           regs.eip, ins);
    ptrace(PTRACE_DETACH, traced_process, 
           NULL, NULL);
    return 0;
}

上面的程序僅僅是附着在子進程上，等待它結束，並測量它的eip( 指令指針)然後釋放子進程。

設置斷點

調試器是怎麼設置斷點的呢？通常是將當前將要執行的指令替換成trap指令，於是被調試的程序就會在這裏停滯，這時調試器就可以察看被調試程序的信息了。被調試程序恢復運行以後調試器會把原指令再放回來。這裏是一個例子：

#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <linux/user.h>

const int long_size = sizeof(long);

void getdata(pid_t child, long addr, 
             char *str, int len)
{   
    char *laddr;
    int i, j;
    union u {
            long val;
            char chars[long_size];
    }data;

    i = 0;
    j = len / long_size;
    laddr = str;

    while(i < j) {
        data.val = ptrace(PTRACE_PEEKDATA, child, 
                          addr + i * 4, NULL);
        memcpy(laddr, data.chars, long_size);
        ++i;
        laddr += long_size;
    }
    j = len % long_size;
    if(j != 0) {
        data.val = ptrace(PTRACE_PEEKDATA, child, 
                          addr + i * 4, NULL);
        memcpy(laddr, data.chars, j);
    }
    str[len] = '';
}

void putdata(pid_t child, long addr, 
             char *str, int len)
{   
    char *laddr;
    int i, j;
    union u {
            long val;
            char chars[long_size];
    }data;

    i = 0;
    j = len / long_size;
    laddr = str;
    while(i < j) {
        memcpy(data.chars, laddr, long_size);
        ptrace(PTRACE_POKEDATA, child, 
               addr + i * 4, data.val);
        ++i;
        laddr += long_size;
    }
    j = len % long_size;
    if(j != 0) {
        memcpy(data.chars, laddr, j);
        ptrace(PTRACE_POKEDATA, child, 
               addr + i * 4, data.val);
    }
}

int main(int argc, char *argv[])
{   
    pid_t traced_process;
    struct user_regs_struct regs, newregs;
    long ins;
    /* int 0x80, int3 */
    char code[] = {0xcd,0x80,0xcc,0};
    char backup[4];
    if(argc != 2) {
        printf("Usage: %s <pid to be traced> ", 
               argv[0], argv[1]);
        exit(1);
    }
    traced_process = atoi(argv[1]);
    ptrace(PTRACE_ATTACH, traced_process, 
           NULL, NULL);
    wait(NULL);
    ptrace(PTRACE_GETREGS, traced_process, 
           NULL, ®s);
    /* Copy instructions into a backup variable */
    getdata(traced_process, regs.eip, backup, 3);
    /* Put the breakpoint */
    putdata(traced_process, regs.eip, code, 3);
    /* Let the process continue and execute 
       the int 3 instruction */
    ptrace(PTRACE_CONT, traced_process, NULL, NULL);
    wait(NULL);
    printf("The process stopped, putting back "
           "the original instructions ");
    printf("Press <enter> to continue ");
    getchar();
    putdata(traced_process, regs.eip, backup, 3);
    /* Setting the eip back to the original 
       instruction to let the process continue */
    ptrace(PTRACE_SETREGS, traced_process, 
           NULL, ®s);
    ptrace(PTRACE_DETACH, traced_process, 
           NULL, NULL);
    return 0;

}

上面的程序將把三個byte的內容進行替換以執行trap指令，等被調試進程停滯以後，我們把原指令再替換回來並把eip修改爲原來的值。下面的圖中演示了指令的執行過程。


1. 進程停滯後	2. 替換入trap指令

3.斷點成功，控制權交給了調試器	4. 繼續運行，將原指令替換回來並將eip復原

ptrace的幕後工作

那麼，在使用ptrace的時候，內核裏發生了聲麼呢？這裏有一段簡要的說明：

當一個進程調用了 ptrace( PTRACE_TRACEME, …)之後，內核爲該進程設置了一個標記，註明該進程將被跟蹤。內核中的相關原代碼如下：

Source: arch/i386/kernel/ptrace.c
if (request == PTRACE_TRACEME) {
/* are we already being traced? */
if (current->ptrace & PT_PTRACED)
goto out;
/* set the ptrace bit in the process flags. */
current->ptrace |= PT_PTRACED;
ret = 0;
goto out;
}
一次系統調用完成之後，內核察看那個標記，然後執行trace系統調用（如果這個進程正處於被跟蹤狀態的話）。其彙編的細節可以在 arh/i386/kernel/entry.S中找到。

現在讓我們來看看這個sys_trace()函數（位於 arch/i386/kernel/ptrace.c ）。它停止子進程，然後發送一個信號給父進程，告訴它子進程已經停滯，這個信號會激活正處於等待狀態的父進程，讓父進程進行相關處理。父進程在完成相關操作以後就調用ptrace( PTRACE_CONT, …)或者 ptrace( PTRACE_SYSCALL, …), 這將喚醒子進程，內核此時所作的是調用一個叫wake_up_process() 的進程調度函數。其他的一些系統架構可能會通過發送SIGCHLD給子進程來達到這個目的。

小結：
ptrace函數可能會讓人們覺得很奇特，因爲它居然可以檢測和修改一個運行中的程序。這種技術主要是在調試器和系統調用跟蹤程序中使用。它使程序員可以在用戶級別做更多有意思的事情。已經有過很多在用戶級別下擴展操作系統得嘗試，比如UFO,一個用戶級別的文件系統擴展，它使用ptrace來實現一些安全機制。

作者:
Pradeep Padala,
[email protected]
http://www.cise.ufl.edu/~ppadala

ptrace運行原理及使用詳解

杭州的 IT 崩盤了麼？

開源高性能結構化日誌模塊NanoLog

【簡寫Mybatis-02】註冊機的實現以及SqlSession處理

手繪二維碼

.NET藉助虛擬網卡實現一個簡單異地組網工具

scons —— Python自動化編譯構建工具

gcc 編譯器最常用的命令行參數

Linux vim快捷鍵操作命令整理

VMware 中添加新的虛擬磁盤的方法

單發射與多發射

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結