gdb分析堆棧破壞實例

一、定位bug性質和範圍

1、帶符號分析dump

$ gdb IMActivityServer.symbol core.32530
(gdb) bt
#0  0x0000000000a6951a in ?? ()
#1  0x00000000018c6db8 in ?? ()
#2  0x8f127f1911ab2800 in ?? ()
#3  0x00000000018c6d00 in ?? ()
#4  0x0000000400000004 in ?? ()
#5  0x00000000018c6d88 in ?? ()
#6  0x00000000006ba9bf in ?? ()
#7  0x00000000018a4400 in ?? ()
#8  0x8f127f1911ab2800 in ?? ()
#9  0x00000000018c6d00 in ?? ()
#10 0x00007f518b789010 in ?? ()
#11 0x00000000018c6d00 in ?? ()
#12 0x0000000000693166 in ?? ()
#13 0x0000000000000000 in ?? ()

看不出任何信息,日誌也看不出什麼,懷疑是堆棧破壞

2、增加堆棧保護, 用編譯參數-fstack-protector-all爲所有函數插入保護代碼,編譯版本,再次帶符號查看崩潰dump

$ gdb IMActivityServer.symbol core.32530
(gdb) bt
#0  0x00007f08a95d7118 in ?? () from /lib64/libgcc_s.so.1
#1  0x00007f08a95d8019 in _Unwind_Backtrace () from /lib64/libgcc_s.so.1
#2  0x00007f08a93110c6 in backtrace () from /lib64/libc.so.6
#3  0x00007f08a927c334 in __libc_message () from /lib64/libc.so.6
#4  0x00007f08a9314a77 in __fortify_fail () from /lib64/libc.so.6
#5  0x00007f08a9314a40 in __stack_chk_fail () from /lib64/libc.so.6
#6  0x00000000006909a9 in ActivityService::cmdMsgParse (this=<optimized out>, ptNullCmd=<optimized out>, 
    nCmdLen=<optimized out>) at ActivityServer.cpp:930
#7  0x00007f08a75ae1e8 in ?? ()
#8  0x00007f0818000978 in ?? ()
#9  0x00007f0818000a08 in ?? ()
#10 0x00007f0818000970 in ?? ()
#11 0x0000000000000000 in ?? ()

堆棧可以看出函數名了cmdMsgParse,查看源碼文件ActivityServer.cpp:930是函數返回地址,斷定是某一個消息引起的堆棧破壞,這個是統一消息處理函數,消息量很大,

3、增加消息號處理日誌,再放一個版本
多臺服務器的日誌最後一行都是:

180531-12:43:51 ActivityServer[17701] ERROR: [ActivityInfoManager.cpp:1293] cmd(51) param(0) len(70) server(0) id(0)

基本可以判斷是51號消息引起的

二、詳細分析bug

51號消息是一個通用轉發包裝消息,需要解析內部消息內容,考慮下斷點統一處理函數
ActivityInfoManager::msgParseTask

1、先找到函數定義,看是否正確,有源碼可以省略

(gdb) info func ActivityInfoManager::msgParseTask
All functions matching regular expression "ActivityInfoManager::msgParseTask":

File ActivityInfoManager.cpp:
bool ActivityInfoManager::msgParseTask(Cmd::t_NullCmd const*, unsigned int, void const*);
bool ActivityInfoManager::msgParseTask(unsigned int, Cmd::t_NullCmd const*, unsigned int, void const*);
bool ActivityInfoManager::msgParseTaskByWebTransfer(Cmd::t_NullCmd const*, unsigned int, void const*);
bool ActivityInfoManager::msgParseTaskGM(Cmd::t_NullCmd const*, unsigned int, unsigned int);

2、再斷下來解析參數

(gdb) break ActivityInfoManager.cpp:1293
(gdb) c
Continuing.
[Switching to Thread 0x7f83877fe700 (LWP 414)]

Breakpoint 1, ActivityInfoManager::msgParseTask (this=0x7f840c16e010, ptNullCmd=0x7f83877edc30, cmdLen=84, 
    server_param=0x7f83877edc20) at ActivityInfoManager.cpp:1293
1293    ActivityInfoManager.cpp: 沒有那個文件或目錄.
(gdb) info locals
__temp_format__ = <error reading variable __temp_format__ (can't compute CFA for this frame)>
s = 0x7f83877edc20
buffercmd = <error reading variable buffercmd (can't compute CFA for this frame)>
cmd = <optimized out>
pBase = <optimized out>
// 查看ptNullCmd的結構體定義,有源碼可以省略
(gdb) info types Cmd::t_NullCmd
All types matching regular expression "Cmd::t_NullCmd":

File ../../common/zNullCmd.h:
Cmd::t_NullCmd;
Cmd::t_NullCmd;
Cmd::t_NullCmd;
Cmd::t_NullCmd;
Cmd::t_NullCmd;

// 顯示結構體的內容,有源碼可以省略
(gdb) ptype Cmd::t_NullCmd  
type = class Cmd::t_NullCmd {  
  public:  
    union {  
        <no data fields>  
    };  

    void t_NullCmd(BYTE, BYTE);  
}  

// 顯示參數詳細內容
(gdb) p *(const Cmd::t_NullCmd *) 0x7f83877edc30
$11 = {{{byCmd = 51 '3', byParam = 0 '\000'}, {cmd = 51 '3', para = 0 '\000'}}}
(gdb) p *(Cmd::Activity::stServerParam*)0x7f83877fdc10
$6 = {asServer = 0 '\000', type = 0, serverid = 0}

// 查看消息值
(gdb) p ptNullCmd->cmd
$2 = 33 '!'

3、根據參數值下條件斷點

// 清除老斷點
(gdb) clear
Deleted breakpoint 1 
// 下條件斷點
(gdb) break ActivityInfoManager.cpp:1293 if ptNullCmd->cmd == 51
Breakpoint 2 at 0x6bdd50: file ActivityInfoManager.cpp, line 1293.
(gdb) c
Continuing.

4、分析51號消息

// 查看斷下來的消息
(gdb) p *(const Cmd::t_NullCmd *) 0x7f83877fdc30
$11 = {{{byCmd = 51 '3', byParam = 0 '\000'}, {cmd = 51 '3', para = 0 '\000'}}}

// 51號消息結構等價於
struct stActivityInCmd{
    BYTE cmd;
    BYTE para;
    DWORD ActId;
    WORD size;
    char data[0]
}
// 消息在內部data中,可以知道data是stActivityInCmd結構體地址+8字節,頭也是Cmd::t_NullCmd結構體
(gdb) p *(const Cmd::t_NullCmd *) 0x7f83877fdc30+8
$11 = {{{byCmd = 51 '3', byParam = 12 '\f'}, {cmd = 51 '3', para = 12 '\f'}}}
// 可以看出是51號消息,子消息號是12,查源碼知道消息是'Cmd::Activity::stOpMount'
// 顯示詳細結構內容
(gdb) p *(Cmd::Activity::stOpMount*)0x7f83877fdc38
$10 = {<Cmd::t_NullCmd> = {{{byCmd = 51 '3', byParam = 12 '\f'}, {cmd = 51 '3', para = 12 '\f'}}}, dwReqFunction = 10, 
  szName = "領工資\000sigin_pay_map_.size:[%d]\000", byOpType = 1 '\001', dwIndex = 0, dwUserId = 22684283, wAddType = 32, 
  dwTimeStart = 1527740441, dwTimeEnd = 1528345241, bNeedDelMount = false, wDelType = 0, bExtension = true}

查看消息Cmd::Activity::stOpMount的處理流程,發現一處堆棧覆蓋問題

Cmd::Activity::stActivityInCmd cmd;
...
bcopy(rev->data, cmd.data, rev->size);
...

消息沒有初始化就使用了,直接往data裏面寫數據,參考上面的結構體定義,data指向的是結構體堆棧末尾,導致數據直接寫入了堆棧中,覆蓋了原有堆棧內容。

三、修復bug

修復的方法很簡單,初始化一下結構體再使用就可以了。

四、gdb打印日誌

$ gdb attach 28644
// 加載符號
(gdb) symbol-file IMActivityServer.symbol 
Reading symbols from /home/ztgame/IMTESTVERSION/release/IMActivityServer.symbol...done.
// 開啓日誌
(gdb) set logging on
Future logs will be written to gdb.txt.
Copying output to gdb.txt.
// 下斷點
(gdb) break ActivityInfoManager.cpp:1290
Breakpoint 2 at 0x6b70f0: file ActivityInfoManager.cpp, line 1290.
// 導入python庫
(gdb) python import datetime
// 增加斷點腳本命令
(gdb) commands 2    //指令集設置命令,斷點序號
Type commands for breakpoint(s) 2, one per line.
End with a line saying just "end".
>silent         //斷點觸發時不打印斷點信息
>python gdb.execute("set $now=\"" + datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S') + "\"")
>printf "%s cmd(%u) param(%u) len(%u) srvtype(%u) srvid(%u)\n",$now,ptNullCmd->cmd, ptNullCmd->para,cmdLen,((Cmd::Activity::stServerParam*)server_param)->type,((Cmd::Activity::stServerParam*)server_param)->serverid
>continue
>end    //指令集設置結束時必須用end結束
(gdb) c

打開gdb.txt

2018-05-31 19:45:45 cmd(17) param(1) len(38) srvtype(43) srvid(4301)
2018-05-31 19:45:45 cmd(17) param(1) len(49) srvtype(43) srvid(4301)
2018-05-31 19:45:45 cmd(17) param(1) len(49) srvtype(43) srvid(4301)
2018-05-31 19:45:45 cmd(17) param(1) len(38) srvtype(43) srvid(4301)
2018-05-31 19:45:45 cmd(17) param(1) len(38) srvtype(43) srvid(4301)
2018-05-31 19:45:46 cmd(22) param(14) len(54) srvtype(150) srvid(15000)
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章