善用GDB 調試一些函數棧被毀壞的問題

最近差一些問題,這些問題的現象一開始難以解釋,函數的參數地址在函數內部被傳遞給另外的函數,然後發現地址發生了改變,這樣的情況稱之爲函數的棧被毀壞,導致無法重入。

然後被調用的函數裏面,訪問了非法的地址導致了segment fault,產生core dump文件。問題比較棘手


查了一些文件,準備從gdb的棧保護設置開始着手。


1) 編譯的時候添加編譯選項

-fstack-protector 和 -fstack-protector-all 這兩個選項指示編譯器開啓棧保護,這樣在棧亂序的第一時間可以dump出來現場。可加在Makefile裏面, 順便扯一句,Makefile這種東西對於搞開源軟件的人,還真是得精通,我隨便想寫個Makefile玩着,突然感覺自己頭腦一片空白。


2) gdb的多線程功能

bt 查看當前線程的調用棧

bt full 查看詳細的調用棧

info threads 可以查看所有線程的信息

thread <num> 可以具體跳轉到某個線程

f <num> 可以跳轉到某個棧中位置

i locals 顯示當前調用棧的所有變量

i register 顯示當前調用棧的寄存器值,主要是查看地址

有了這些命令的幫助我們可以從core dump 的文件裏面分析出很多問題。


下面舉個例子:

gdb /lab/testtools/rhel664/dallas/testRelease/R10A06_dynamic_udpport_5/mnsserv/bin/mhlif core-mhlif-18310-1384802382 

(gdb) bt
#0  0x0000003383488611 in memcpy () from /lib64/libc.so.6
#1  0x000000000041a9aa in ReadFromQueue (q=0x647580, msg=0x4fc780004fc71 <Address 0x4fc780004fc71 out of bounds>, size=280,
    time=21081) at ltsosdep.c:443
#2  0x000000000041b552 in OSH_ReceiveMsgQMillisec (q=0x647580, msg=0x4fc780004fc71 <Address 0x4fc780004fc71 out of bounds>,
    size=280, time=21081) at ltsosdep.c:1370
#3  0x000000000042d47d in RPS::ReceiveMsg (this=0x2b3100005330, delay=21081) at rps.cc:590
#4  0x000000000042d731 in RPS::Execute (this=0x2b31681ffdf0) at rps.cc:572
#5  0x000000000042dbe8 in StartRps (arg=0x157a680) at rps.cc:181
#6  0x0000003383c077e1 in start_thread () from /lib64/libpthread.so.0
#7  0x00000033834e68ed in clone () from /lib64/libc.so.6

(gdb) bt full
#0  0x0000003383488611 in memcpy () from /lib64/libc.so.6
No symbol table info available.
#1  0x000000000041a9aa in ReadFromQueue (q=0x647580, msg=0x4fc780004fc71 <Address 0x4fc780004fc71 out of bounds>, size=280,
    time=21081) at ltsosdep.c:443
        row = 0x2b31682a433c
        answer = LTS_OK
#2  0x000000000041b552 in OSH_ReceiveMsgQMillisec (q=0x647580, msg=0x4fc780004fc71 <Address 0x4fc780004fc71 out of bounds>,
    size=280, time=21081) at ltsosdep.c:1370
No locals.
#3  0x000000000042d47d in RPS::ReceiveMsg (this=0x2b3100005330, delay=21081) at rps.cc:590
        rpsMsg = {msgId = 4501, type = 0 '\000', data = {loadReplayReq = {
              fileName = "Ú%\004\000\002\000\000\000\001\000\000\000\235ú\004\000tQ\003\000GP\003\000¸U\000\000Oû\004\000pR\000\000\206ü\004\000\bú\004\000ÅS\000\000vR\000\000\067P\003\000fP\003\000 ü\004\000Úü\004\000¢P\003\000ÿT\000\000\vý\004\000²O\003\000Z\002\002\000Nú\004\000+ú\004\000>ú\004\000\233T\000\000íÿ\001\000ÊT\000\000G\001\002\000M\001\002\000Y\003\002\000£ú\004\000\020ú\004\000\032\000\002\000ÎU\000\000x\000\002\000\035\001\002\000K\002\002\000æù\004\000\206S\000\000\071U\000\000\232ü\004\000õP\003\000ë\000\002\000\202S\003\000Ø\000\002\000xú\004\000\201\001\002\000=T\000\000oR\000\000"..., natType = 48 '0', timeStretch = 11057,
              rpsType = 2156588448}, replayConReq = {msIndex = 271834, contextIndex = 2 '\002', resend = 0 '\000', replayId = 0,
              sessionId = 1, sessionTime = 326301, destIp1 = {addr64 = {932690803249524, 1402216627852728},
                b = "tQ\003\000GP\003\000¸U\000\000Oû\004", addr16 = {20852, 3, 20551, 3, 21944, 0, 64335, 4}, ui = {i1 = 217460,
                  i2 = 217159, i3 = 21944, ipv4 = 326479}}, destIp2 = {addr64 = {1403552362680944, 92105573988872},
                b = "pR\000\000\206ü\004\000\bú\004\000ÅS\000", addr16 = {21104, 0, 64646, 4, 64008, 4, 21445, 0}, ui = {
                  i1 = 21104, i2 = 326790, i3 = 326152, ipv4 = 21445}}, reqPackets = 21110, timeStretch = 217143, type = 102 'f',
              radiotype = 80 'P', kernelMsId = 933081645382874}, msgQid = {_qId = 0x2000425da}, payloadPropReq = {
              payloadPropId = 271834, groupId = 2 '\002', msgLength = 0, userBw = 1}, connectionReq = {msIndex = 271834,
              contextIndex = 2 '\002', payloadPropId = 0, sessionId = 1, addresses = {GiIpAddr = {addr64 = {932690803249524,
                    1402216627852728}, b = "tQ\003\000GP\003\000¸U\000\000Oû\004", addr16 = {20852, 3, 20551, 3, 21944, 0, 64335,
                    4}, ui = {i1 = 217460, i2 = 217159, i3 = 21944, ipv4 = 326479}}, msPortNo = 21104, GiPortNo = 0},
              reqPackets = 326152, initiator = 197 'Å', type = 83 'S', radiotype = 0 '\000', kernelMsId = 932622083576438},
            rpsDeactReq = {msIndex = 271834, contextIndex = 2 '\002', sendMhlResponse = LTS_TRUE, sessionId = {326301, 217460,
                217159, 21944, 326479, 21104, 326790, 326152, 21445, 21110, 217143, 217190, 326688, 326874, 217250, 21759, 326923,
                217010, 131674, 326222}, pdpcontextId = 326187, sessionnum = 62 '>'}, moveUpdateDataReq = {msIndex = 271834,
              toDevice = 2 '\002', moveIndex = 1, status = 326301}, suspendResumeReq = {msIndex = 271834, sessionId = {2, 1,
                326301, 217460, 217159, 21944, 326479, 21104, 326790, 326152, 21445, 21110, 217143, 217190, 326688, 326874,
                217250, 21759, 326923, 217010}, sessionnum = 90 'Z', contextIndex = 2 '\002'}, rabCreateReleaseReq = {
              msIndex = 271834, contextId = 2 '\002'}, peMoveResp = {msIndex = 271834, toDevice = 2 '\002', moveIndex = 1,
              peIndex = 326301, status = 217460}, scalePayloadReq = {scaleFactor = 271834}, magQid = {_qId = 0x2000425da}}}
        count = <value optimized out>
#4  0x000000000042d731 in RPS::Execute (this=0x2b31681ffdf0) at rps.cc:572
        nowTime = 12394937602
        nextTime = <value optimized out>
        count = 844209533
        entry = <value optimized out>
        pEngine = 0x2b3168518270
#5  0x000000000042dbe8 in StartRps (arg=0x157a680) at rps.cc:181
        Rps = {mhlifQId = {_qId = 0x2b315c225000}, magifQId = {_qId = 0x2b319454d000}, initQId = {_qId = 0x2b315c225000},
          mDeviceNo = 12, mRpsState = RPS_RUNNING_STATE, sessionRepository = {rpsSessionPolymer = {buckets = 100001,
              hash_func = 0x42a5c0 <hash_algorithm(unsigned int, unsigned int)>, p_dataRepository = 0x2b316c001070}},
          log = @0x1538040, apnDev = 10, vpReplayStore = std::vector of length 0, capacity 0,
          mpAlreadyLoaded = std::map with 0 elements}
#6  0x0000003383c077e1 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#7  0x00000033834e68ed in clone () from /lib64/libc.so.6


一般來說bt full沒什麼用,但是可以看到一些局部變量的值,但是有些值不可靠,我們還不能準確的定位

(gdb) info threads
  16 Thread 0x2b3151cb7100 (LWP 18310)  0x0000003383c0b44c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  15 Thread 0x2b315c54b700 (LWP 18428)  0x00000033834df443 in select () from /lib64/libc.so.6
  14 Thread 0x2b315c224700 (LWP 18423)  0x00000033834df443 in select () from /lib64/libc.so.6
  13 Thread 0x2b31525e5700 (LWP 18422)  0x0000003383c0b44c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  12 Thread 0x2b3151fb0700 (LWP 18313)  0x00000033834df443 in select () from /lib64/libc.so.6
  11 Thread 0x2b3194873700 (LWP 18535)  0x00000033834df443 in select () from /lib64/libc.so.6
  10 Thread 0x2b319454c700 (LWP 18534)  0x00000033834df443 in select () from /lib64/libc.so.6
  9 Thread 0x2b3194225700 (LWP 18533)  0x00000033834df443 in select () from /lib64/libc.so.6
  8 Thread 0x2b3188425700 (LWP 18531)  0x00000033834df443 in select () from /lib64/libc.so.6
  7 Thread 0x2b3188200700 (LWP 18530)  0x0000003383c0b44c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  6 Thread 0x2b3178602700 (LWP 18529)  0x0000003383c0b44c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  5 Thread 0x2b3178401700 (LWP 18435)  0x00000033834df443 in select () from /lib64/libc.so.6
  4 Thread 0x2b3178200700 (LWP 18434)  0x00000033834df443 in select () from /lib64/libc.so.6
  3 Thread 0x2b3169f6b700 (LWP 18433)  0x0000003383c0b44c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  2 Thread 0x2b3169d6a700 (LWP 18432)  0x0000003383c0b44c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
* 1 Thread 0x2b3168200700 (LWP 18429)  0x0000003383488611 in memcpy () from /lib64/libc.so.6

(gdb) thread 1
[Switching to thread 1 (Thread 0x2b3168200700 (LWP 18429))]#0  0x0000003383488611 in memcpy () from /lib64/libc.so.6
(gdb) bt
#0  0x0000003383488611 in memcpy () from /lib64/libc.so.6
#1  0x000000000041a9aa in ReadFromQueue (q=0x647580, msg=0x4fc780004fc71 <Address 0x4fc780004fc71 out of bounds>, size=280,
    time=21081) at ltsosdep.c:443
#2  0x000000000041b552 in OSH_ReceiveMsgQMillisec (q=0x647580, msg=0x4fc780004fc71 <Address 0x4fc780004fc71 out of bounds>,
    size=280, time=21081) at ltsosdep.c:1370
#3  0x000000000042d47d in RPS::ReceiveMsg (this=0x2b3100005330, delay=21081) at rps.cc:590
#4  0x000000000042d731 in RPS::Execute (this=0x2b31681ffdf0) at rps.cc:572
#5  0x000000000042dbe8 in StartRps (arg=0x157a680) at rps.cc:181
#6  0x0000003383c077e1 in start_thread () from /lib64/libpthread.so.0
#7  0x00000033834e68ed in clone () from /lib64/libc.so.6


(gdb) f 1
#1  0x000000000041a9aa in ReadFromQueue (q=0x647580, msg=0x4fc780004fc71 <Address 0x4fc780004fc71 out of bounds>, size=280,
    time=21081) at ltsosdep.c:443
443     ltsosdep.c: No such file or directory.
        in ltsosdep.c
(gdb) i locals
row = 0x2b31682a433c
answer = LTS_OK

(gdb) i register
rax            0x2b0000001197   47278999998871
rbx            0x4fc780004fc71  1403492233444465
rcx            0x7      7
rdx            0x118    280
rsi            0x2b31682a4340   47491200992064
rdi            0x4fc780004fc71  1403492233444465
rbp            0x2b31681ffb80   0x2b31681ffb80
rsp            0x2b31681ffb20   0x2b31681ffb20
r8             0x1c0002000ce527 7881307938678055
r9             0x2b310003517c   47489453609340
r10            0x0      0
r11            0x202    514
r12            0x525a00005259   90546500555353
r13            0x2b3100005330   47489453413168
r14            0x20c49ba5e353f7cf       2361183241434822607
r15            0x2b316c106d70   47491266407792
rip            0x41a9aa 0x41a9aa <ReadFromQueue+518>
eflags         0x10203  [ CF IF RF ]
cs             0x33     51
ss             0x2b     43
ds             0x0      0
es             0x0      0
fs             0x0      0

這裏只是演示了一些查看core dump文件的方法,其實在進程alive的時候,我們可以直接attach 到進程上面去分析代碼。


(gdb) attach 2467
Attaching to process 2467
Reading symbols from /root/algorithm/testBh...done.
Reading symbols from /usr/lib/libstdc++.so.6...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libstdc++.so.6
Reading symbols from /lib/tls/i686/cmov/libm.so.6...Reading symbols from /usr/lib/debug/lib/tls/i686/cmov/libm-2.11.1.so...done.
done.
Loaded symbols for /lib/tls/i686/cmov/libm.so.6
Reading symbols from /lib/libgcc_s.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libgcc_s.so.1
Reading symbols from /lib/tls/i686/cmov/libc.so.6...Reading symbols from /usr/lib/debug/lib/tls/i686/cmov/libc-2.11.1.so...done.
done.
Loaded symbols for /lib/tls/i686/cmov/libc.so.6
Reading symbols from /lib/ld-linux.so.2...Reading symbols from /usr/lib/debug/lib/ld-2.11.1.so...done.
done.
Loaded symbols for /lib/ld-linux.so.2
0x005f7422 in __kernel_vsyscall ()
(gdb) break testBh.cc:38
Breakpoint 1 at 0x80488ff: file testBh.cc, line 38.
(gdb) c
Continuing.


這些方法可以讓進程掛住,然後單步調試,或者print一些局部變量

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章