使用Windbg解析dump文件

第一章 常用的Windbg指令

!analyze -v 
kP                                               可以看函數的入參
!for_each_frame dv /t                            可以看函數中的局部變量
dc , db                                          產看某一內存中的值    可以直接接變量名 不過可能需要回溯棧
!threads                                         顯示所有線程


~0s , ~1s                                       進入某個線程
!frame ProcessA!FunctionA                        查看某一變量有時需要。 回溯棧  
!uniqstack                                       擴展命令顯示當前進程中所有線程的調用堆棧,除開重複的那些。   
!teb                                             擴展以的格式化後的形式顯示線程環境塊(TEB)的信息。 
s-sa 和 s-su                                     命令搜索未指定的 ASCII 和 Unicode 字符串。這在檢查某段內存是否包含可打印字符時有用。
dds、dps 和 dqs 命令顯示給定範圍內存的內容。     該內存被假定爲符號表中的一連串地址。相應的符號也會被顯示出來。命令顯示給定範圍內存的內容,它們是把內存區域轉儲出來,並把內存中每個元素都視爲一個符號對其進行解析,dds是四字節視爲一個符號,dqs是每8字節視爲一個符號,dps是根據當前處理器架構來選擇最合適的長度
.kframes                                        命令設置堆棧回溯顯示的默認長度。默認20
k, kb, kd, kp, kP, kv (Display Stack Backtrace) k*命令顯示給定線程的調用堆棧,以及其他相關信息。通常要結合12)使用否則顯示出來的東西很少
.reload /i xxx.dll                              忽略.pdb 文件版本不匹配的情況。

第二章 Symbol的設置方法
2.1 將遠程的系統函數的PDB文件拷貝到本地「D:\mysymbol」目錄下
    SRV*D:\mysymbol*http://msdl.microsoft.com/download/symbols

2.2 加載設置的符號文件
    .reload
    可以使用菜單中的 Debug -> Modules 查看有沒有加載進來


第三章 實例

實例1 如何調查堆被破壞問題。

    錯誤代碼:0xc0000374
    錯誤含義:ACTIONABLE_HEAP_CORRUPTION_heap_failure_buffer_overrun


第一步、先用「!analyze -v」分析出錯誤的地方以及由於什麼原因導致程序Dump掉的。
       無非是內存溢出,訪問非法地址等幾種。


0:009> !analyze -v
*******************************************************************************
*                                                                             *
*                        Exception Analysis                                   *
*                                                                             *
*******************************************************************************
GetPageUrlData failed, server returned HTTP status 404
URL requested: http://watson.microsoft.com/StageOne/ProcessA_exe/1_0_0_1/5134aefd/ntdll_dll/6_1_7601_18229/51fb164a/c0000374/000c4102.htm?Retriage=1

FAULTING_IP: 
ntdll!RtlReportCriticalFailure+62
00000000`777b4102 eb00            jmp     ntdll!RtlReportCriticalFailure+0x64 (00000000`777b4104)

EXCEPTION_RECORD:  ffffffffffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress: 00000000777b4102 (ntdll!RtlReportCriticalFailure+0x0000000000000062)
   ExceptionCode: c0000374
  ExceptionFlags: 00000001
NumberParameters: 1
   Parameter[0]: 000000007782b4b0

PROCESS_NAME:  ProcessA.exe

ERROR_CODE: (NTSTATUS) 0xc0000374 - <Unable to get error code text>

EXCEPTION_CODE: (NTSTATUS) 0xc0000374 - <Unable to get error code text>

EXCEPTION_PARAMETER1:  000000007782b4b0

MOD_LIST: <ANALYSIS/>

NTGLOBALFLAG:  0

APPLICATION_VERIFIER_FLAGS:  0

FAULTING_THREAD:  0000000000002f8c

DEFAULT_BUCKET_ID:  ACTIONABLE_HEAP_CORRUPTION_heap_failure_buffer_overrun

PRIMARY_PROBLEM_CLASS:  ACTIONABLE_HEAP_CORRUPTION_heap_failure_buffer_overrun

BUGCHECK_STR:  APPLICATION_FAULT_ACTIONABLE_HEAP_CORRUPTION_heap_failure_buffer_overrun

LAST_CONTROL_TRANSFER:  from 00000000777b4746 to 00000000777b4102

STACK_TEXT:  
00000000`0548e170 00000000`777b4746 : 00000000`00000002 00000000`00000023 00000000`00000000 00000000`00000003 : ntdll!RtlReportCriticalFailure+0x62
00000000`0548e240 00000000`777b5952 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`1c01001d : ntdll!RtlpReportHeapFailure+0x26
00000000`0548e270 00000000`777b7604 : 00000000`00c50000 00000000`00c50000 00000000`0000000a 00000000`00000000 : ntdll!RtlpHeapHandleError+0x12
00000000`0548e2a0 00000000`777b79e8 : 00000000`00c50000 00000000`00000000 00000000`00100000 00000000`00000000 : ntdll!RtlpLogHeapFailure+0xa4
00000000`0548e2d0 00000000`7774fad6 : 00000000`00c50000 00000000`00c59e50 00000000`00c50000 00000000`00000000 : ntdll!RtlpAnalyzeHeapFailure+0x3a8
00000000`0548e330 00000000`777434d8 : 00000000`00c50000 00000000`00000003 00000000`000006cc 00000000`000006e0 : ntdll!RtlpAllocateHeap+0x1d2a
00000000`0548e8d0 00000000`777247ea : 00000000`00000003 00000000`00c5ee80 00000000`00c50278 00000000`000006cc : ntdll!RtlAllocateHeap+0x16c
00000000`0548e9e0 00000000`77723ff2 : 00000000`00c50000 00000000`00000003 00000000`00c5ee90 00000000`000006cc : ntdll!RtlpReAllocateHeap+0x648
00000000`0548eca0 00000000`750c712f : 00000000`0548fbe8 00000000`00c5ee90 00000000`00000000 00000000`000005ac : ntdll!RtlReAllocateHeap+0xa2
00000000`0548edb0 00000001`40010f6f : 00000000`00000000 00000000`0548fbe8 00000000`00000000 00000000`00000661 : msvcr80!realloc+0x6f [f:\dd\vctools\crt_bld\self_64_amd64\crt\src\realloc.c @ 332]
00000000`0548ede0 00000001`4000f63c : ffffffff`ffffffff 00000000`0548ff10 00000000`00c97fd0 00000000`0548fe48 : ProcessA!FunctionA_AnalyzeEventData+0xfcf [e:\ProcessA\FunctionA_sockserv.cpp @ 1666]
00000000`0548f8a0 00000000`774e652d : 00000000`000002a0 00000000`00000000 00000000`00000000 00000000`00000000 : ProcessA!FunctionA_SockWork+0xe1c [e:\ProcessA\FunctionA_sockserv.cpp @ 1102]
00000000`0548ff60 00000000`7771c541 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : kernel32!BaseThreadInitThunk+0xd
00000000`0548ff90 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x1d

STACK_COMMAND:  !heap ; ~9s; .ecxr ; kb

FOLLOWUP_IP: 
msvcr80!realloc+6f [f:\dd\vctools\crt_bld\self_64_amd64\crt\src\realloc.c @ 332]
00000000`750c712f 4885c0          test    rax,rax

SYMBOL_STACK_INDEX:  9

SYMBOL_NAME:  msvcr80!realloc+6f

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: msvcr80

IMAGE_NAME:  msvcr80.dll

DEBUG_FLR_IMAGE_TIMESTAMP:  4ec3407e

FAILURE_BUCKET_ID:  ACTIONABLE_HEAP_CORRUPTION_heap_failure_buffer_overrun_c0000374_msvcr80.dll!realloc

BUCKET_ID:  X64_APPLICATION_FAULT_ACTIONABLE_HEAP_CORRUPTION_heap_failure_buffer_overrun_msvcr80!realloc+6f

WATSON_STAGEONE_URL:  http://watson.microsoft.com/StageOne/ProcessA_exe/1_0_0_1/5134aefd/ntdll_dll/6_1_7601_18229/51fb164a/c0000374/000c4102.htm?Retriage=1

Followup: MachineOwner
---------

第二步、使用「!heap」找出出錯的堆。分析出錯的原因。

       0000000000c59c80
       0000000000c59e50  ←出錯的堆地址。
       0000000000c59fd0

大家應該有這樣的常識,在使用malloc()或者realloc()分配出來的空間的前面都有
相應的管理情報,用來記錄這塊分配的內存的大小以及返回的時候用的情報。

從這裏很自然的猜想到,在寫往0000000000c59c80裏面寫數據的時候寫過了,
寫到0000000000c59e50上去了,導致它的管理情報被覆蓋了。從而程序dump掉了。

0:009> !heap
**************************************************************
*                                                            *
*                  HEAP ERROR DETECTED                       *
*                                                            *
**************************************************************
Details:

Error address: 0000000000c59e50
Heap handle: 0000000000c50000
Error type heap_failure_buffer_overrun (6)
Parameter 1: 000000000000000a
Last known valid blocks: before - 0000000000c59c80, after -0000000000c59fd0
Stack trace:
                00000000777b79e8: ntdll!RtlpAnalyzeHeapFailure+0x00000000000003a8
                000000007774fad6: ntdll!RtlpAllocateHeap+0x0000000000001d2a
                00000000777434d8: ntdll!RtlAllocateHeap+0x000000000000016c
                00000000777247ea: ntdll!RtlpReAllocateHeap+0x0000000000000648
                0000000077723ff2: ntdll!RtlReAllocateHeap+0x00000000000000a2
                00000000750c712f: msvcr80!realloc+0x000000000000006f
                0000000140010f6f: ProcessA!FunctionA_AnalyzeEventData+0x0000000000000fcf
                000000014000f63c: ProcessA!FunctionA_SockWork+0x0000000000000e1c
                00000000774e652d: kernel32!BaseThreadInitThunk+0x000000000000000d
                000000007771c541: ntdll!RtlUserThreadStart+0x000000000000001d
Index   Address  Name      Debugging options enabled
  1:   001f0000                
  2:   00010000                
  3:   00020000                
  4:   00670000                
  5:   00950000                
  6:   00c50000                
  7:   00910000                
  8:   00bc0000                
  9:   010e0000                
 10:   01220000                
 11:   01420000                
 12:   00c30000                
 13:   03660000                
 14:   00ba0000                
 15:   037b0000                
 16:   01340000                
 17:   039a0000                

第三步、使用「!for_each_frame dv /t」打印出錯函數的局部變量,找出元兇。

       從下面的變量裏面找到距離0000000000c59c80地址最近的變量,對了就是它:


       char * pData_n = 0x00000000`00c59c90 "SE:Security: ???"


       ※注意如果變量值指針的指針需要先用dc看一下該指針指向的地址。

       之後看代碼知道,程序在讀取pData_n的數據的時候如果遇到是0a(Windos換行符)就自動在後面加上
       0d變成0a0d。導致pData_n內存越界了。

0:009> !for_each_frame dv /t
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
12 00000000`0548edb0 00000001`40010f6f msvcr80!realloc+0x6f [f:\dd\vctools\crt_bld\self_64_amd64\crt\src\realloc.c @ 332]
void * pBlock = 0x00000000`00000000
unsigned int64 newsize = 0x548fbe8
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
13 00000000`0548ede0 00000001`4000f63c ProcessA!FunctionA_AnalyzeEventData+0xfcf [e:\ProcessA\FunctionA_sockserv.cpp @ 1666]
void * cd = 0xffffffff`ffffffff
struct _MpEvsHead * Head = 0x00000000`0548ff10
char * pEventData = 0x00000000`00c97fd0 "???"
char ** pNewData = 0x00000000`0548fe48
char * SiteName = 0x00000000`0548fe18 ""
int oval_check = 0n0
char * pszHostIp = 0x00000000`0548fbf0 "192.168.1.1"
int j = 0n469
int NodeName_check = 0n0
char [2068] eventtext = char [2068] "SE:Security: ???"
unsigned long err = 0
int NL_henkan = 0n1
int Evttxt_check = 0n1
char [129] nameWork = char [129] "`_???"
int ret = 0n0
struct NameObject_t * pNameObj_n = 0x00000000`00c5eee8
char * pData_n = 0x00000000`00c59c90 "SE:Security: ???"
long lWork = 0n9
char [257] szTrcBuff = char [257] "safely divided text.([453]bytes --> [469]bytes)"
long nNameNum = 0n44
long nNewLen = 0n1740
struct NameObject_t * pNameObj_o = 0x00000000`00c98028
char * pData_o = 0x00000000`00c984c6 "SE:Security: ???"
char * pt = 0x00000000`00c59e55 "[???"
long i = 0n20
int IpAddr_check = 0n0
int res = 0n1
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
14 00000000`0548f8a0 00000000`774e652d ProcessA!FunctionA_SockWork+0xe1c [e:\ProcessA\FunctionA_sockserv.cpp @ 1102]
void * ns = 0x00000000`000002a0
char * pRead_str = 0x00000000`00c562f0 ","
int bTableRegisterd = 0n0
unsigned long err = 0
char [3] traceflg = char [3] ""
int ret = 0n0
short sWork = 0n2
int oval_check = 0n0
char * pNewData = 0x00000000`00c5ee90 "???"
char * wk = 0x00000000`0548f930 "192.168.1.1"
char [33] SiteName = char [33] ""
long lWork = 0n2032
char [257] szTrcBuff = char [257] "recv event OK"
int iLastSerchedIndex = 0n0
char [256] HostIp = char [256] "192.168.1.1"
int ret2 = 0n0
struct _MpEvsHead Head = struct _MpEvsHead
long nDataLen = 0n3
char [257] szTrcBuff2 = char [257] ""
char [20] szSendData = char [20] "OK"
struct addrinfo hinst = struct addrinfo
int conv_disc_set = 0n1
long lRc = 0n0
void * conv_disc = 0xffffffff`ffffffff
int res = 0n1
char * pData = 0x00000000`00c97fd0 "???"
long nRead = 0n3726
char [16] evttype = char [16] "Alarm.sys"
char * lpszEventid = 0x00000000`00c5f180 ""
long nSend = 0n12
char [256] ipTmp = char [256] "192.168.1.1"
char [20] szToCode = char [20] "sjis"
char [20] szFromCode = char [20] "sjis"
int bWriteEvent = 0n1
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

實例2  無效參數(STATUS_INVALID_PARAMETER)。

    錯誤代碼:0xc000000d
    錯誤含義:STATUS_INVALID_PARAMETER

第一步、先用「!analyze -v」分析出錯誤的地方以及由於什麼原因導致程序Dump掉的。
 
0:000> !analyze -v
*******************************************************************************
*                                                                             *
*                        Exception Analysis                                   *
*                                                                             *
*******************************************************************************


*** ERROR: Symbol file could not be found.  Defaulted to export symbols for user32.dll - 
Unable to load image C:\Windows\Odsv.dll, Win32 error 0n2
*** WARNING: Unable to verify timestamp for Odsv.dll
*** ERROR: Module load completed but symbols could not be loaded for Odsv.dll
GetPageUrlData failed, server returned HTTP status 404
URL requested: http://watson.microsoft.com/StageOne/ProcessB_exe/1_0_0_1/4e362265/msvcr80_dll/8_0_50727_6195/4dcdd833/c000000d/0001d5fa.htm?Retriage=1


FAULTING_IP: 
msvcr80!strncpy_s+10a [f:\dd\vctools\crt_bld\self_64_amd64\crt\src\tcsncpy_s.inl @ 62]
00000000`74e6d5fa b822000000      mov     eax,22h


EXCEPTION_RECORD:  ffffffffffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress: 0000000074e6d5fa (msvcr80!strncpy_s+0x000000000000010a)
   ExceptionCode: c000000d
  ExceptionFlags: 00000000
NumberParameters: 0


PROCESS_NAME:  ProcessB.exe


ERROR_CODE: (NTSTATUS) 0xc000000d - <Unable to get error code text>


EXCEPTION_CODE: (NTSTATUS) 0xc000000d - <Unable to get error code text>


MOD_LIST: <ANALYSIS/>


NTGLOBALFLAG:  0


APPLICATION_VERIFIER_FLAGS:  0


LAST_CONTROL_TRANSFER:  from 0000000000124250 to 0000000074e5b0ec


FAULTING_THREAD:  ffffffffffffffff


DEFAULT_BUCKET_ID:  STATUS_INVALID_PARAMETER


PRIMARY_PROBLEM_CLASS:  STATUS_INVALID_PARAMETER


BUGCHECK_STR:  APPLICATION_FAULT_STATUS_INVALID_PARAMETER


IP_ON_STACK: 
+2e32faf01dedf58
00000000`00124250 60              ???


FRAME_ONE_INVALID: 1


STACK_TEXT:  
00000000`00124220 00000000`00124250 : 00000000`00000006 00000000`00000000 00000000`00000001 00000000`00000000 : msvcr80!_invalid_parameter+0x6c [f:\dd\vctools\crt_bld\self_64_amd64\crt\src\invarg.c @ 88]
00000000`00124228 00000000`00000006 : 00000000`00000000 00000000`00000001 00000000`00000000 00000000`00000000 : 0x124250
00000000`00124230 00000000`00000000 : 00000000`00000001 00000000`00000000 00000000`00000000 00000000`00124260 : 0x6




STACK_COMMAND:  ~0s; .ecxr ; kb


FOLLOWUP_IP: 
msvcr80!strncpy_s+10a [f:\dd\vctools\crt_bld\self_64_amd64\crt\src\tcsncpy_s.inl @ 62]
00000000`74e6d5fa b822000000      mov     eax,22h


FAULTING_SOURCE_CODE:  
No source found for 'f:\dd\vctools\crt_bld\self_64_amd64\crt\src\tcsncpy_s.inl'




SYMBOL_STACK_INDEX:  0


SYMBOL_NAME:  msvcr80!strncpy_s+10a


FOLLOWUP_NAME:  MachineOwner


MODULE_NAME: msvcr80


IMAGE_NAME:  msvcr80.dll


DEBUG_FLR_IMAGE_TIMESTAMP:  4dcdd833


FAILURE_BUCKET_ID:  STATUS_INVALID_PARAMETER_c000000d_msvcr80.dll!strncpy_s


BUCKET_ID:  X64_APPLICATION_FAULT_STATUS_INVALID_PARAMETER_msvcr80!strncpy_s+10a


WATSON_STAGEONE_URL:  http://watson.microsoft.com/StageOne/ProcessB_exe/1_0_0_1/4e362265/msvcr80_dll/8_0_50727_6195/4dcdd833/c000000d/0001d5fa.htm?Retriage=1


Followup: MachineOwner
---------
這次運氣很不好,從「!analyze -v」打出來的結果來看看不出啥東西來,只知道
在調用strncpy_s的時候dmp掉了,無法定位具體是哪個函數出錯的原因很多,有可能
客戶採集的不是全dmp文件或者dmp文件中的棧被破壞了。  

這的確很傷腦筋,就針對這個我可是花了3個星期一行行的解析棧裏面的內容 才解決的。


第二步、先用「!teb」看一下這個程序的棧是從哪裏到哪裏的。

0:000>!teb
TEB at 000007ffffeee000
    ExceptionList:        0000000000000000
    StackBase:            0000000008d50000
    StackLimit:           0000000008d4d000
    SubSystemTib:         0000000000000000
    FiberData:            0000000000001e00
    ArbitraryUserPointer: 0000000000000000
    Self:                 000007ffffeee000
    EnvironmentPointer:   0000000000000000
    ClientId:             0000000000001bdc . 0000000000001868
    RpcHandle:            0000000000000000
    Tls Storage:          000007ffffeee058
    PEB Address:          000007fffffd6000
    LastErrorValue:       87
    LastStatusValue:      c000000d
    Count Owned Locks:    0
    HardErrorMode:        0


第三步、先用「dps」看一下這個程序的棧中的內存的內容。 下面截取其中比較重要的一段。

-------------------------------------------------------------------------------------------------------------------------------
00000000`001247d8  00000000`74e6d5fa msvcr80!strncpy_s+0x10a [f:\dd\vctools\crt_bld\self_64_amd64\crt\src\tcsncpy_s.inl @ 62]
00000000`001247e0  00000000`009c01e0
00000000`001247e8  00000000`030f5810
00000000`001247f0  00000000`0057e310 ProcessB2!work   
★「ProcessB2!work」的內容本應該是像這樣的數據「DNxxxxxxxx_150_109」
但是現在「ProcessB2!work」中的內容卻是「VIP_rtcrx00184-004a/b-y3b-d」這個。

00000000`001247f8  00000000`005782c0 ProcessB2!trcData 
▲「ProcessB2!trcData」的內容是「Function:testB call」。
 函數List::testB の trace("testB", __FILE__, __LINE__, TRCLV_3);

00000000`00124800  00000000`00000000
00000000`00124808  00000000`00000000
00000000`00124810  00000000`004a3150 ProcessB2!`string' 
▲「 ProcessB2!`string'」的內容是「e:\ProcessB\FunctionB.cpp  __FILE__」。

00000000`00124818  00000000`00455b65 ProcessB2!List::testB+0x55 [e:\ProcessB\Listset.cpp @ 719]
00000000`00124820  00000000`009c01e0
00000000`00124828  00000000`030f5810
00000000`00124830  00000000`0057e310 ProcessB2!work
00000000`00124838  00000000`001249e0
00000000`00124840  32322e35`322e3000
00000000`00124848  30614031`33312e34
00000000`00124850  7097fb8e`bc923730
00000000`00124858  5049565f`5753334c
00000000`00124860  00000000`0000125f
00000000`00124868  000082bd`b1200d5e
00000000`00124870  00000000`009c01e0
00000000`00124878  00000000`00467bda ProcessB2!FunctionB+0x73a [e:\ProcessB\FunctionB.cpp @ 181]   
-------------------------------------------------------------------------------------------------------------------------------
這裏終於定位到是哪個函數出問題。搞清楚這些函數的功能,然後打印出所有可能打印的內容,發現
函數傳遞了一個不合法的數據。在這裏要說一下爲啥傳的數據不合法就會Dmp掉。

首先strncpy 這個函數在使用的時候只要有個宏定義(默認是有的)在編譯的時候就會使用strncpy_s這個安全函數。
詳情可以參考下面微軟的說明文檔。
http://msdn.microsoft.com/zh-cn/LIBRARY/ms175759(v=vs.80)

其次說明一下爲什麼會dmp掉。strncpy在使用的時候如果轉化成strncpy_s的時候是這樣一種形式。
char dst[5];
strncpy(dst, "a long string", 5);    ---->  strncpy_s(dst, 5, "a long string", 5);

而這樣就會到時報STATUS_INVALID_PARAMETER這個錯誤這是strncpy_s的特性。具體使用方法可以參考下面的文檔。
http://msdn.microsoft.com/zh-cn/library/5dae5d43(v=vs.90).aspx

節選:
char dst[5];
strncpy_s(dst, 5, "a long string", 5);
means that we are asking strncpy_s to copy five characters into a buffer five bytes long; this would leave no space for the null terminator, hence strncpy_s zeroes out the string and calls the invalid parameter handler.
If truncation behavior is needed, use _TRUNCATE or (size – 1):
strncpy_s(dst, 5, "a long string", _TRUNCATE);
strncpy_s(dst, 5, "a long string", 4);

詳細的ACTIONABLE_HEAP_CORRUPTION_heap_failure_buffer_overrun方法還可以參考以下的例子:

http://blogs.msdn.com/b/jiangyue/archive/2010/03/16/windows-heap-overrun-monitoring.aspx



發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章