Windbg介紹

1. 介紹

1.1. 相關網站

微軟網站

社區網站

調試分享

1.2. 下載

對於Win10,推薦使用微軟商店下載windbg preview版本。其它的os,可以下載windows sdk,在安裝選項中選擇windbg。

preview版本皆可調試32,64位進程和內核。傳統版本分爲32和64兩個版本,但優勢在小巧便攜。

1.3. 符號配置

微軟的動態庫,exe等,微軟一般會公開pdb文件的下載,windbg經過配置後,會自動下載,從而利於解析程序的數據結構。

簡易的方法是配置環境變量_NT_SYMBOL_PATH

srv*c:\mss*http://msdl.microsoft.com/download/symbols

這樣,windbg會優先在c:\mss裏尋找pdb,若沒有,再去下載。

但是近來這個國內實際沒法通過這個網站下載,可嘗試使用鏡像站(最近好像也不行了)

http://sym.ax2401.com:9999/symbols

或尋找一個可以科學上網的http代理(不能是sock5代理),然後配置環境變量_NT_SYMBOL_PROXY爲代理地址,形如127.0.0.1:10809

以下演示一下:

a. 新建notepad進程。

b. windbg裏attach到notepad進程

c. 輸入!sym noisy使之展示符號載入細節

0:000> !sym noisy
noisy mode - symbol prompts on

d. 輸入.reload /f ntdll.dll讓windbg下載ntdll.pdb,如果看到

SYMSRV:  HTTPGET: /download/symbols/ntdll.pdb/54A4ABD98E75ECCB93B912D2747051301/ntdll.pdb
SYMSRV:  HttpSendRequest: 800C2EFD - ERROR_INTERNET_CANNOT_CONNECT

之類的提示,那一般就是網絡不通,沒法下載

2. 常用命令

windbg的命令分爲標準命令,.命令和!命令。!命令實際上windbg的擴展插件裏引入的,只不過有時候略寫了插件名。

例如!kdexts.locks!ntsdexts.locks是不同的命令。在windbg目錄裏搜索kdext.dllntsdexts.dll能直接找到。

windbg裏按下F1,即可檢索各個命令,查看詳情與示例

2.1. 常用的標準命令

命令|作用

-|-

d|查看內存數據

k|查看棧回溯

u|將地址反彙編爲彙編代碼

r|查看寄存器數據

~|切換線程(用戶態)

dt|查看結構體的定義

b|下斷點

lmf|查看各個模塊的文件路徑和符號載入

2.2. 常用的.命令有

命令|作用

-|-

.reload|載入pdb

.load|載入擴展插件

.process|切換到進程(內核態)

.thread|切換到(內核態)

.detach|脫離調試目標

.kill|關閉進程

2.3. 常用的!命令有

命令|作用

-|-

!process|羅列進程(內核態)

!running|展示cpu當前運行的線程(內核態)

!analyze|自動分析dmp

!handle|逆向handle

!fileobj|逆向文件對象

!locks|分析死鎖

!poolused|展示內存池消耗(內核態)

!memusage|展示物理內存的統計信息(內核態)

3. 調試環境搭建

3.1. 實時調試進程

File->attach to process

如果使用傳統版本windbg,注意調試32位進程使用32位windbg,64的則用64的。

3.2. 調試進程dump

進程dump分兩種。一種是手工抓的,例如任務管理器裏右擊進程條目創建dump。另一種是崩潰後,windows幫我們抓的崩潰現場。簡易的做法是用管理權限執行

procdump -i -ma c:\windows\temp

那麼進程崩潰後,會在此目錄下發現2個dmp文件。

windbg的file->Open dump file選擇dmp文件打開。

3.3. 調試內核dump

內核dump分爲mini,core,full三種。在windows藍屏後,即生成。minidump通常位於C:\windows\minidump\,後兩者則通常是C:\windows\memory.dmp文件。coredmp只包含內核層的數據。

coredump和fulldump需要事先增加一定的windows虛擬內存,否則可能會生成失敗,參見《深入解析Windows操作系統》崩潰轉儲分析章節。在磁盤分區較滿時也會在開機時自動刪除,可通過註冊表配置解決。

對於能預測到windows內核層卡死的情況,可事先配置註冊表,卡死按下

右Ctrl+雙擊scrolllock,手工觸發藍屏。再從dump中來分析卡死當時的內核。

本小節配置參見藍屏設置.zip

3.4. 配置可調試的內核

對於複雜的內核卡死的現象,還可以配置實時調試環境。以下僅述winxp,win7,win10在idv,vdi的內核調試環境的搭建。

被調試機一般稱爲server機,windbg所在機稱爲client機。

通用的方法是通過串口調試。host端給虛機windows提供模擬串口設備,windows配置以支持由串口調試。隨後host將串口設備和socket listen端口相連接。調試機通過tcp連接上那個socket端口。調試機裏創建模擬2個串口,1個連接socket端口,一個被windbg連接。如下圖所示:

tbBVy9.jpg

具體做法如下:

a. 對於idv,編輯

/etc/vmmanger/vm.template.oa.xml

添加

<qemu:commandline>
    ……….
    <qemu:arg value="-serial"/>
    <qemu:arg value="tcp::9090,server,nowait"/>
  </qemu:commandline>

b. 對於vdi,在啓動虛機前,先編輯

/etc/libvirt/rco_spice.xml

亦添加這段。tcp端口一般不能再用9090,通常都是早就被別人佔了。

啓動虛機後,再換回rco_spice.xml。否則會影響其它vdi虛機的啓動。

c. 在虛機裏用cmd執行

bcdedit /set debug on
bcdedit /set debugtype serial
bcdedit /set debugport 1
bcdedit /set baudrate 115200
bcdedit /set {bootmgr} displaybootmenu yes
bcdedit /timeout 2

這樣虛機每次開機都會支持從串口發起調試。

d. 調試client端,安裝虛擬串口工具,如Virtual Serial Port Driver。運行之,添加兩個串口comX,comY。將二者連在一起。

e. client端,安裝combytcp。左邊寫上host機的ip和端口,點擊client。右邊選擇comX,baudrate選擇115200。

f. 打開windbg,執行File->attach to kernel->填寫comY和baudrate。點擊OK

g. windbg提示開始載入符號,一般就表示連接成功。太久的話,可以試試按下Ctrl+Break是否有反應。

對於裸機裝win10,可以使用usb端口調試或者網卡調試。usb端口調試需要物理端口支持。

網卡調試參閱https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/setting-up-a-network-debugging-connection。較簡單,client和server之間直接進行網絡連接。值得注意的是,此會把client機的Windows的網卡名字給了。

另外,如果server機開啓了內核調試,即使用戶態代碼執行了彙編int 3,那麼整個內核也會被暫停。可能是windows刻意如此設計。但是,對於使用者來說十分不便。甚至引發客戶誤報故障。

3.5. 遠程調試進程

windbg舊版小巧便攜,可以發送到客戶現場。如果遇到用戶態程序的疑難雜症,可以進行遠程調試。大致是客戶的Windows裏開啓windbg server端,連接目標程序。我們的windows開啓windbg客戶端,連接客戶的windbg。

至於兩個windbg的網絡連接,可能需要內網穿透技術的支持。

a. 客戶的Windows裏運行dbgsrv.exe -t -tcp:port=5002

b. 我們的windows裏運行windbg,選擇File->Connect to Remote Stub Server。輸入tcp:port=5002,server=XXX,然後attach to process,可以調試server機的進程。參見https://blog.csdn.net/xiaohua_de/article/details/78779434。

c. 可能需要防火牆放通相關端口,建議事先試驗一下。

遠程調試進程的方法有不少,之所以用這種,是因爲pdb是用client的網絡來下載的。畢竟客戶現場的環境通常沒法配代理下載pdb。

3.6. 調試donet進程或dump

首先許多donet程序文件,使用dotpeek可以很容易反編譯得到源碼。並且可以生成代碼文件和pdb。

分析彙編代碼當然較晦澀,如果Windbg能直接結合pdb和源碼,那麼分析過程就清晰得多。

分析donet部分線程還需要donet的相關dll文件,這樣才能直接看到donet代碼而不是彙編碼。

以下以調試一個dump爲例:

a. 將崩潰現場的donet%windir%\microsoft.net\framework\<.NET 版本>的相關dll複製到自機,需要sos.dll,mscordacwks.dll(需要改名mscordacwks_AAA_AAA_x.x.x.xxxx.dll

其中AAA是dump的bit位對應的(x86 or AMD64),x.x.x.x實際上就是mscordacwks.dll的版本號(可以從文件的property dialog中查到),比如2.0.50272.8763, 4.7.2114.0等。
)。或者自己也安裝相同版本的donet,需要版本號嚴格相同。

c. 推薦使用舊版windbg。windbg打開dmp,配置source path,填入源碼的所在路徑。配置symbol file path,添加b所得的pdb。配置image file path,添加a,b中的exe,dll路徑。

d. 輸入

.load X:\Y\Z\sos.dll

.load X:\Y\Z\clr.dll

成功後輸入!clrstack,可以看到clr的棧

0:000> !clrstack
OS Thread Id: 0x1b54 (0)
Child SP       IP Call Site
01b7dfb8 77802bec [InlinedCallFrame: 01b7dfb8] 
01b7dfb4 687f1c3c *** WARNING: Unable to verify checksum for System.Windows.Forms.ni.dll
DomainBoundILStubClass.IL_STUB_PInvoke(System.Runtime.InteropServices.HandleRef, System.String, System.String, Int32)
01b7dfb8 68903382 [InlinedCallFrame: 01b7dfb8] System.Windows.Forms.SafeNativeMethods.MessageBox(System.Runtime.InteropServices.HandleRef, System.String, System.String, Int32)
01b7dffc 68903382 System.Windows.Forms.MessageBox.ShowCore(System.Windows.Forms.IWin32Window, System.String, System.String, System.Windows.Forms.MessageBoxButtons, System.Windows.Forms.MessageBoxIcon, System.Windows.Forms.MessageBoxDefaultButton, System.Windows.Forms.MessageBoxOptions, Boolean)
01b7e000 68902fd3 [InlinedCallFrame: 01b7e000] 
01b7e088 68902fd3 System.Windows.Forms.MessageBox.Show(System.String, System.String, System.Windows.Forms.MessageBoxButtons, System.Windows.Forms.MessageBoxIcon)
01b7e098 08e2a46c AT.SoaFace.Core.Tools.Utility.MsgBox.ShowError(System.String)
01b7e0ac 08e2a257 AT.SoaFace.AppEngine.Program.ApplicationThreadException(System.Object, System.Threading.ThreadExceptionEventArgs)
01b7e0bc 68848c25 System.Windows.Forms.Application+ThreadContext.OnThreadException(System.Exception)
01b7e0f8 68852da7 System.Windows.Forms.Control.WndProcException(System.Exception)
01b7e104 68aa215f System.Windows.Forms.Control+ControlNativeWindow.OnThreadException(System.Exception)
01b7e108 6824764e System.Windows.Forms.NativeWindow.Callback(IntPtr, Int32, IntPtr, IntPtr)
01b7e3ec 01fad0ce [InlinedCallFrame: 01b7e3ec] 
01b7e3e8 682a302c DomainBoundILStubClass.IL_STUB_PInvoke(MSG ByRef)
01b7e3ec 68256ce1 [InlinedCallFrame: 01b7e3ec] System.Windows.Forms.UnsafeNativeMethods.DispatchMessageW(MSG ByRef)
01b7e420 68256ce1 

但是kb看彙編就完全不一樣了。

0:000> kb
ChildEBP RetAddr  Args to Child              
01b7dc58 7644870d 00070860 00000001 00000008 win32u!NtUserWaitMessage+0xc
01b7dc98 764485fd 00000008 00000001 00000000 user32!DialogBox2+0x103
01b7dcc8 764a2a08 00070860 764a0f30 01b7df08 user32!InternalDialogBox+0xe2
01b7dd8c 77803b3c 01b7def0 764a1897 01b7df08 user32!SoftModalMessageBox+0x728
01b7dd94 764a1897 01b7df08 21ff2084 00000000 win32u!NtUserModifyUserStartupInfoFlags+0xc
01b7df9c 687f1c3c 00070860 21ff2084 10bb7278 user32!MessageBoxWorker+0x2ca
01b7dfe8 68903382 00000010 00000000 00070860 System_Windows_Forms_ni+0x761c3c
01b7e068 68902fd3 00000000 00000000 00000000 System_Windows_Forms_ni+0x873382
01b7e0b4 68848c25 00000000 21ff1f78 03dda280 System_Windows_Forms_ni+0x872fd3
01b7e0f0 68852da7 21e1542c 01b7e218 68aa215f System_Windows_Forms_ni+0x7b8c25
01b7e0fc 68aa215f 6824764e 01b7e234 6b74c240 System_Windows_Forms_ni+0x7c2da7
01b7e24c 7646626b 00010880 00000202 00000000 System_Windows_Forms_ni+0xa1215f
01b7e278 764571bc 0680b23e 00010880 00000202 user32!_InternalCallWinProc+0x2b
01b7e35c 764562eb 0680b23e 00000000 00000202 user32!UserCallWinProcCheckWow+0x3ac
01b7e3d0 764560c0 01b7e458 01b7e418 682a302c user32!DispatchMessageWorker+0x21b
01b7e3dc 682a302c 01b7e458 fa8a32e1 6b66faf0 user32!DispatchMessageW+0x10
01b7e418 68256ce1 fa8a32e1 6b66faf0 01b7e540 System_Windows_Forms_ni+0x21302c
01b7e49c 682568f3 00000000 00000004 00000000 System_Windows_Forms_ni+0x1c6ce1
01b7e4f0 68256760 21e1c314 00000000 00000000 System_Windows_Forms_ni+0x1c68f3
01b7e51c 68847f87 21e1c314 21e695e4 000d059a System_Windows_Forms_ni+0x1c6760
01b7e534 6886dd92 fa8a32e1 6b66faf0 01b7e930 System_Windows_Forms_ni+0x7b7f87
01b7e604 6b66ebb6 21b75258 01d40008 01b7e668 System_Windows_Forms_ni+0x7ddd92
01b7e614 6b671e10 01b7e984 01b7e658 6b749b20 clr!CallDescrWorkerInternal+0x34
01b7e668 6b7fff7b fba4f215 00000002 00000000 clr!CallDescrWorkerWithHandler+0x6b
01b7e6a8 6b800055 21cac6a4 21b853d8 01b7e9c8 clr!CallDescrWorkerReflectionWrapper+0x55
01b7e9bc 6a5afbb1 00000000 1116acdc 6a5b1530 clr!RuntimeMethodHandle::InvokeMethod+0x84e

更多內容請參閱《格蠹彙編》第22,23章,參閱https://www.cnblogs.com/kissdodog/p/3731743.html。

3.7. 時間旅行調試

windbg preview纔有的功能。時間行程調試是一種工具,可讓你記錄正在運行的進程的執行,然後在以後向前和向後重播。 旅行調試(TTD)可讓您通過 “倒帶” 調試器會話來更輕鬆地調試問題,而無需在發現 bug 之前重現問題。

參見https://docs.microsoft.com/zh-cn/windows-hardware/drivers/debugger/time-travel-debugging-overview

4. 崩潰卡死等調試示例

4.1. 死鎖導致的應用卡死

故障現象

反覆關開powerpnt.exe24小時,有機率出現powerpnt窗口已小時,但是進程仍然存留的問題。

分析

此時該進程的cpu並不高,因此不是相關線程陷入簡單死循環。耗時良久,還是不能結束,一般來說也不是卡頓。推測可能是死鎖,或者陷入內核不返回。

當然,也可以實時調試。本次是調試dump。

輸入!locks分析死鎖,得到

CritSec ntdll!LdrpLoaderLock+0 at 77a17340
WaiterWoken        No
LockCount          2
RecursionCount     1
OwningThread       b70
EntryCount         0
ContentionCount    8
*** Locked

CritSec GDI32!semLocal+0 at 774f9140
WaiterWoken        No
LockCount          1
RecursionCount     1
OwningThread       fc8
EntryCount         0
ContentionCount    1
*** Locked

可見有兩個criticalsection類型正在被鎖住。一個是77a17340,持有線程是b70。另一個是774f9140,持有線程是fc8。

輸入!threads,得到線程概覽

0:000> !thread
No export thread found
0:000> !threads
Index TID       TEB         StackBase	StackLimit  DeAlloc     StackSize ThreadProc
0	00000b70	0x7ffdf000	0x001b0000	0x001a2000	0x000b0000	0x0000e000	0x0
1	00000abc	0x7ffde000	0x02bd0000	0x02bcf000	0x02ad0000	0x00001000	0x0
2	00000784	0x7ffdd000	0x02cf0000	0x02cee000	0x02bf0000	0x00002000	0x0
3	00000fc8	0x7ffdc000	0x02fb0000	0x02fac000	0x02eb0000	0x00004000	0x0
4	00000218	0x7ffda000	0x03120000	0x0311f000	0x03020000	0x00001000	0x0
5	000004d0	0x7ffd9000	0x03790000	0x0378c000	0x03690000	0x00004000	0x0
Total VM consumed by thread stacks 0x0001a000

切換到b70,即0號線程,輸入~0s

想看看0號線程正在做什麼,爲什麼不釋放cs77a17340。輸入kv

0:000> kv
ChildEBP RetAddr  Args to Child              
001a6a40 77985e6c 7796fc72 00000268 00000000 ntdll!KiFastSystemCallRet (FPO: [0,0,0])
001a6a44 7796fc72 00000268 00000000 00000000 ntdll!NtWaitForSingleObject+0xc (FPO: [3,0,0])
001a6aa8 7796fb56 00000000 00000000 00000001 ntdll!RtlpWaitOnCriticalSection+0x13e (FPO: [Non-Fpo])
001a6ad0 774b7acf 774f9140 37010818 01570ae0 ntdll!RtlEnterCriticalSection+0x150 (FPO: [Non-Fpo])
001a6ae4 774b7a3d 37010818 001a6af4 00000010 GDI32!GetFontRealizationInfo+0x4e (FPO: [Non-Fpo])
001a6b0c 77021af7 37010818 001a6b20 37010818 GDI32!GdiRealizationInfo+0x1b (FPO: [Non-Fpo])
001a6b38 77021c26 37010818 000004e4 001a6c8c LPK!FontHasWesternScript+0x21 (FPO: [Non-Fpo])
001a6b48 774bd81c 37010818 72959818 00000001 LPK!LpkUseGDIWidthCache+0x91 (FPO: [Non-Fpo])
001a6c8c 774bd8d0 37010818 72959818 00000001 GDI32!GetTextExtentPointAInternal+0x134 (FPO: [Non-Fpo])
001a6ca8 729433c9 37010818 72959818 00000001 GDI32!GetTextExtentPointA+0x18 (FPO: [Non-Fpo])
WARNING: Stack unwind information not available. Following frames may be wrong.
001a6ccc 7799316f 7799316f 72943218 72940000 MSVBVM60!EbLibraryLoad+0x170
001a6dd4 7799fd0d 00000000 00000000 6e7a981a ntdll!RtlpFreeHeap+0xb7a (FPO: [Non-Fpo])
001a6ef0 7799349f 779934ca 000001de 000001e8 ntdll!LdrpRunInitializeRoutines+0x24b (FPO: [Non-Fpo])
001a6fc4 75bfb8a4 003a7cd4 001a7004 001a6ff0 ntdll!RtlpAllocateHeap+0xe73 (FPO: [Non-Fpo])

猜測0號線程正在wait什麼CriticalSection而未返回。先欲查看wait的是誰。ntdll!NtWaitForSingleObject,ntdll!RtlpWaitOnCriticalSection,ntdll!RtlEnterCriticalSection這三個API都不是公開的。windbg解析函數的實際入參也經常不準。因此不易判斷。

API雖然不公開,但是ntdll可以使用IDA反彙編,我們可以大致知道他們的形參。

推測正在wait的cs是774f9140,因爲它是ntdll!RtlEnterCriticalSection的參數。

這樣,就說明0號線程被3號線程卡住。

接下來看看3號線程持有cs774f9140,但爲什麼不釋放。

輸入~3s切換到3號線程,查看棧回溯

0:003> kb
ChildEBP RetAddr  Args to Child              
02fae634 77985e6c 7796fc72 00000204 00000000 ntdll!KiFastSystemCallRet
02fae638 7796fc72 00000204 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
02fae69c 7796fb56 00000000 00000000 00000001 ntdll!RtlpWaitOnCriticalSection+0x13e
02fae6c4 7799f6d0 77a17340 75397f2e 779870da ntdll!RtlEnterCriticalSection+0x150
02fae830 7799f5f9 02fae890 02fae85c 00000000 ntdll!LdrpLoadDll+0x287
02fae864 75bfb8a4 00340924 02fae8a4 02fae890 ntdll!LdrLoadDll+0x92
02fae89c 778b28c3 00000000 00000000 00000002 KERNELBASE!LoadLibraryExW+0x15a
02fae8b0 774c2cf5 774c31e0 00000000 774c2cbd kernel32!LoadLibraryW+0x11
02fae8bc 774c2cbd 00000000 015e83b8 00000000 GDI32!bLoadSpooler+0x24

同理,可以看到RtlEnterCriticalSection正在等待cs77a17340而未返回。

即0號,3號線程互相死鎖。

更多地,還可以分析3號線程在進程退出的時候爲什麼還要LoadLibrary,load的是那個dll。可能對分析根因有幫助。

4.2. 陷入內核不返回的應用卡死

參見《格蠹彙編》第8章

4.3. 調試崩潰dump

進程崩潰一般都是由Windows異常觸發。c++編譯器實現異常實際上就是依賴於windows異常。本節簡述Windows異常爲異常。

Windows異常明細參見https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-erref/596a1078-e883-4972-9bbc-49e60bebca55。具體原理參閱《軟件調試》第11章。

c語言從雖然從main函數開始編寫,但是vc編譯器實際上在main函數的上層編寫了兩層捕獲異常的代碼。如果異常走到這裏,就會調用HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\AeDebug的調試器接管。procdump自動捕獲它程序崩潰的原理就是設置Aebug爲procdump。因此procdump也會抓到兩個dump。

4.3.1 簡單異常

4.3.1.1 demo

試寫一個訪問null指針的程序崩潰:

void func1(int i) {
	char* p = NULL;
	*p = i;
}
void func2(int i) {	func1(i); }
void func3(int i) {	func2(i); }

int main() {
	func3(5);
	return 0;
}

windbg用!analyze -v分析崩潰原因

PROCESS_NAME:  throwexcept.exe

WRITE_ADDRESS:  00000000 

ERROR_CODE: (NTSTATUS) 0xc0000005 - 0x%p            0x%p                    %s

EXCEPTION_CODE_STR:  c0000005

EXCEPTION_PARAMETER1:  00000001

EXCEPTION_PARAMETER2:  00000000

STACK_TEXT:  
00cff7e4 0051102c 00000005 00cff7fc 0051104c throwexcept!func1+0x11
00cff7f0 0051104c 00000005 00cff808 0051106a throwexcept!func2+0xc
00cff7fc 0051106a 00000005 00cff850 00511241 throwexcept!func3+0xc
00cff808 00511241 00000001 00fff678 01000058 throwexcept!main+0xa
00cff850 76166359 00b3f000 76166340 00cff8bc throwexcept!__scrt_common_main_seh+0xfa
WARNING: Stack unwind information not available. Following frames may be wrong.
00cff860 77417c24 00b3f000 56a6e476 00000000 kernel32!BaseThreadInitThunk+0x19
00cff8bc 77417bf4 ffffffff 77438fea 00000000 ntdll!__RtlUserThreadStart+0x2f
00cff8cc 00000000 005112c9 00b3f000 00000000 ntdll!_RtlUserThreadStart+0x1b


SYMBOL_NAME:  throwexcept!func1+11

MODULE_NAME: throwexcept

IMAGE_NAME:  throwexcept.exe

可以看到崩潰原因,是因爲觸發了異常c0000005,即STATUS_ACCESS_VIOLATION。執行崩潰代碼的模塊是throwexcpet.exe,在0號線程的throwexcept!func1+11處。

所以要看看此處執行了什麼代碼導致訪問違例。~0s切換到0號線程,用u命令查看這一段代碼

0:000> u throwexcept!func1
throwexcept!func1:
00511000 55              push    ebp
00511001 8bec            mov     ebp,esp
00511003 51              push    ecx
00511004 c745fc00000000  mov     dword ptr [ebp-4],0
0051100b 8b45fc          mov     eax,dword ptr [ebp-4]
0051100e 8a4d08          mov     cl,byte ptr [ebp+8]
00511011 8808            mov     byte ptr [eax],cl
00511013 8be5            mov     esp,ebp

throwexcept!func1+11處代碼爲即00511011 8808 mov byte ptr [eax],cl,輸入r rax,查看得知rax爲0。

4.3.1.2 explorer無盡崩潰一例

故障現象

舊鏡像正常。編輯鏡像,升級guestttool重啓後黑屏。按Ctrl+alt+del有反應。安全模式下正常。屏蔽Guestttool正常。

分析

按Ctrl+alt+del有反應說明不是卡死,也不是顯示問題。猜測是explorer崩潰導致黑屏。一般explorer崩潰後自動被啓動,猜測是無盡地崩潰,無盡的重啓。

升級Guesttool後導致的問題,故而推測是Guesttool某個dll hook了explorer導致崩潰。

使用procdump收集崩潰dump。

windbg打開dump,輸入!analyze -v自動分析

PROCESS_NAME:  explorer.exe
WRITE_ADDRESS:  0000000009910d48
ERROR_CODE: (NTSTATUS) 0xc0000005 - 0x%p            0x%p                    %s
EXCEPTION_CODE_STR:  c0000005
EXCEPTION_PARAMETER1:  0000000000000001
EXCEPTION_PARAMETER2:  0000000009910d48
IP_ON_HEAP:  0074007500420074
The fault address in not in any loaded module, please check your build's rebase
log at <releasedir>\bin\build_logs\timebuild\ntrebase.log for module which may
contain the address if it were loaded.

STACK_TEXT:  
00000000`09910d50 00000000`773c3c3f : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlDispatchException+0x1d
00000000`09911430 000007fe`fd05bded : 00000000`00000000 00000000`09912238 00000000`0ba9557e 00000000`00000000 : ntdll!RtlRaiseException+0x22f
00000000`09911de0 000007fe`fd06802d : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : KERNELBASE!RaiseException+0x39
00000000`09911eb0 000007fe`fd08ea06 : 00000000`099121a0 00000000`00000000 00000000`09912238 00000000`0000001d : KERNELBASE!OutputDebugStringA+0x6d
00000000`09912180 000007fe`fce1208e : 00000000`0bbdfdc0 00000000`099123e0 00000000`09912238 00000000`00000208 : KERNELBASE!OutputDebugStringW+0x66
00000000`099121d0 000007fe`fce12193 : 000007fe`fce45da0 00000000`c0000034 00000000`09912228 00000000`099123e0 : RCD_AppInit+0x208e
00000000`09912230 000007fe`fce12358 : 00000000`00000000 00000000`09912730 00000000`00000104 00000000`09912b00 : RCD_AppInit+0x2193
00000000`099122e0 000007fe`fce12744 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`09912e88 : RCD_AppInit+0x2358
00000000`09912970 00000000`772790e9 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000030 : RCD_AppInit+0x2744
00000000`09912df0 000007fe`f4db1851 : 00000000`774a7ff0 00000000`00000000 00000000`09077b30 00000000`00000000 : kernel32!SetUnhandledExceptionFilter+0xc9
00000000`099130c0 00000000`773c4918 : 00000000`00000000 00000000`f3ae8b8a 00000000`774a7ff0 00000000`00000000 : baiducn+0x1851
00000000`099130f0 00000000`773c0002 : 00000000`772fe670 00000000`00000000 00000000`00000000 00000000`09077b20 : ntdll!RtlpCallVectoredHandlers+0xa8
00000000`09913160 00000000`773eb61e : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlDispatchException+0x22
00000000`09913840 00000000`773a186a : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!KiUserExceptionDispatch+0x2e
00000000`09913f40 00000000`773a009a : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlpDosPathNameToRelativeNtPathName_Ustr+0x2a
00000000`09914210 000007fe`fd055ee7 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlDosPathNameToRelativeNtPathName_U_WithStatus+0x6a
00000000`09914260 00000000`772b6c55 : 00000000`00000000 00000000`00000000 00000000`00000003 00000000`00000000 : KERNELBASE!CreateFileW+0xa7
00000000`099143c0 000007fe`fce12240 : 00000000`00000000 00000000`09914540 00000000`00000000 00000000`09914d10 : kernel32!FindFirstVolumeW+0x45
00000000`09914440 000007fe`fce12744 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`09914fe8 : RCD_AppInit+0x2240
00000000`09914ad0 00000000`772790e9 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000030 : RCD_AppInit+0x2744
00000000`09914f50 000007fe`f4db1851 : 00000000`774a7ff0 00000000`7748aa00 00000000`09077b30 00000000`00000000 : kernel32!SetUnhandledExceptionFilter+0xc9
00000000`09915220 00000000`773c4918 : 00000000`00000000 00000000`f3ae8b8a 00000000`774a7ff0 00000000`00000000 : baiducn+0x1851
00000000`09915250 00000000`773c0002 : 00000000`0000000e 00000000`00000000 00000000`00000000 00000000`09077b20 : ntdll!RtlpCallVectoredHandlers+0xa8
00000000`099152c0 00000000`773c3c3f : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlDispatchException+0x22
00000000`099159a0 000007fe`fd05bded : 00000000`00000000 00000000`099167a8 00000000`0ba9548e 00000000`00000000 : ntdll!RtlRaiseException+0x22f
00000000`09916350 000007fe`fd06802d : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : KERNELBASE!RaiseException+0x39
00000000`09916420 000007fe`fd08ea06 : 00000000`09916710 00000000`00000000 00000000`099167a8 00000000`0000001d : KERNELBASE!OutputDebugStringA+0x6d
00000000`099166f0 000007fe`fce1208e : 00000000`0bbdfd60 00000000`09916950 00000000`099167a8 00000000`00000208 : KERNELBASE!OutputDebugStringW+0x66
00000000`09916740 000007fe`fce12193 : 000007fe`fce45da0 00000000`c0000034 00000000`09916798 00000000`09916950 : RCD_AppInit+0x208e
00000000`099167a0 000007fe`fce12358 : 00000000`00000000 00000000`09916ca0 00000000`00000104 00000000`09917100 : RCD_AppInit+0x2193
00000000`09916850 000007fe`fce12744 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`099173f8 : RCD_AppInit+0x2358
00000000`09916ee0 00000000`772790e9 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000030 : RCD_AppInit+0x2744
00000000`09917360 000007fe`f4db1851 : 00000000`774a7ff0 00000000`7748aa00 00000000`09077b30 00000000`00000000 : kernel32!SetUnhandledExceptionFilter+0xc9
00000000`09917630 00000000`773c4918 : 00000000`00000000 00000000`f3ae8b8a 00000000`774a7ff0 00000000`00000000 : baiducn+0x1851
00000000`09917660 00000000`773c0002 : 00000000`0000000e 00000000`00000000 00000000`00000000 00000000`09077b20 : ntdll!RtlpCallVectoredHandlers+0xa8
00000000`099176d0 00000000`773c3c3f : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlDispatchException+0x22
00000000`09917db0 000007fe`fd05bded : 00000000`00000000 00000000`09918bb8 00000000`0ba954be 00000000`00000000 : ntdll!RtlRaiseException+0x22f
00000000`09918760 000007fe`fd06802d : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : KERNELBASE!RaiseException+0x39
00000000`09918830 000007fe`fd08ea06 : 00000000`09918b20 00000000`00000000 00000000`09918bb8 00000000`0000001d : KERNELBASE!OutputDebugStringA+0x6d
00000000`09918b00 000007fe`fce1208e : 00000000`0bbdfd00 00000000`09918d60 00000000`09918bb8 00000000`00000208 : KERNELBASE!OutputDebugStringW+0x66
00000000`09918b50 000007fe`fce12193 : 000007fe`fce45da0 00000000`c0000034 00000000`09918ba8 00000000`09918d60 : RCD_AppInit+0x208e
00000000`09918bb0 000007fe`fce12358 : 00000000`00000000 00000000`099190b0 00000000`00000104 00000000`09919500 : RCD_AppInit+0x2193
00000000`09918c60 000007fe`fce12744 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`09919808 : RCD_AppInit+0x2358
00000000`099192f0 00000000`772790e9 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000030 : RCD_AppInit+0x2744
00000000`09919770 000007fe`f4db1851 : 00000000`774a7ff0 00000000`7748aa00 00000000`09077b30 00000000`00000000 : kernel32!SetUnhandledExceptionFilter+0xc9
00000000`09919a40 00000000`773c4918 : 00000000`00000000 00000000`f3ae8b8a 00000000`774a7ff0 00000000`00000000 : baiducn+0x1851
00000000`09919a70 00000000`773c0002 : 00000000`0000000e 00000000`00000000 00000000`00000000 00000000`09077b20 : ntdll!RtlpCallVectoredHandlers+0xa8
00000000`09919ae0 00000000`773c3c3f : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlDispatchException+0x22
00000000`0991a1c0 000007fe`fd05bded : 00000000`00000000 00000000`0991afc8 00000000`0ba9542e 00000000`00000000 : ntdll!RtlRaiseException+0x22f
00000000`0991ab70 000007fe`fd06802d : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : KERNELBASE!RaiseException+0x39
00000000`0991ac40 000007fe`fd08ea06 : 00000000`0991af30 00000000`00000000 00000000`0991afc8 00000000`0000001d : KERNELBASE!OutputDebugStringA+0x6d
00000000`0991af10 000007fe`fce1208e : 00000000`0bbdfca0 00000000`0991b170 00000000`0991afc8 00000000`00000208 : KERNELBASE!OutputDebugStringW+0x66
00000000`0991af60 000007fe`fce12193 : 000007fe`fce45da0 00000000`c0000034 00000000`0991afb8 00000000`0991b170 : RCD_AppInit+0x208e
00000000`0991afc0 000007fe`fce12358 : 00000000`00000000 00000000`0991b4c0 00000000`00000104 00000000`0991b900 : RCD_AppInit+0x2193
00000000`0991b070 000007fe`fce12744 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`0991bc18 : RCD_AppInit+0x2358
00000000`0991b700 00000000`772790e9 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000030 : RCD_AppInit+0x2744
00000000`0991bb80 000007fe`f4db1851 : 00000000`774a7ff0 00000000`7748aa00 00000000`09077b30 00000000`00000000 : kernel32!SetUnhandledExceptionFilter+0xc9
00000000`0991be50 00000000`773c4918 : 00000000`00000000 00000000`f3ae8b8a 00000000`774a7ff0 00000000`00000000 : baiducn+0x1851
00000000`0991be80 00000000`773c0002 : 00000000`0000000e 00000000`00000000 00000000`00000000 00000000`09077b20 : ntdll!RtlpCallVectoredHandlers+0xa8
00000000`0991bef0 00000000`773c3c3f : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlDispatchException+0x22
00000000`0991c5d0 000007fe`fd05bded : 00000000`00000000 00000000`0991d3d8 00000000`0ba9545e 00000000`00000000 : ntdll!RtlRaiseException+0x22f
00000000`0991cf80 000007fe`fd06802d : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : KERNELBASE!RaiseException+0x39
00000000`0991d050 000007fe`fd08ea06 : 00000000`0991d340 00000000`00000000 00000000`0991d3d8 00000000`0000001d : KERNELBASE!OutputDebugStringA+0x6d
00000000`0991d320 000007fe`fce1208e : 00000000`0bbdfc40 00000000`0991d580 00000000`0991d3d8 00000000`00000208 : KERNELBASE!OutputDebugStringW+0x66
00000000`0991d370 000007fe`fce12193 : 000007fe`fce45da0 00000000`c0000034 00000000`0991d3c8 00000000`0991d580 : RCD_AppInit+0x208e
00000000`0991d3d0 000007fe`fce12358 : 00000000`00000000 00000000`0991d8d0 00000000`00000104 00000000`0991dd00 : RCD_AppInit+0x2193
00000000`0991d480 000007fe`fce12744 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`0991e028 : RCD_AppInit+0x2358
00000000`0991db10 00000000`772790e9 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000030 : RCD_AppInit+0x2744
00000000`0991df90 000007fe`f4db1851 : 00000000`774a7ff0 00000000`7748aa00 00000000`09077b30 00000000`00000000 : kernel32!SetUnhandledExceptionFilter+0xc9
00000000`0991e260 00000000`773c4918 : 00000000`00000000 00000000`f3ae8b8a 00000000`774a7ff0 00000000`00000000 : baiducn+0x1851
00000000`0991e290 00000000`773c0002 : 00000000`0000000e 00000000`00000000 00000000`00000000 00000000`09077b20 : ntdll!RtlpCallVectoredHandlers+0xa8
00000000`0991e300 00000000`773c3c3f : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlDispatchException+0x22
00000000`0991e9e0 000007fe`fd05bded : 00000000`00000000 00000000`0991f7e8 00000000`0ba953ce 00000000`00000000 : ntdll!RtlRaiseException+0x22f
00000000`0991f390 000007fe`fd06802d : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : KERNELBASE!RaiseException+0x39
00000000`0991f460 000007fe`fd08ea06 : 00000000`0991f750 00000000`00000000 00000000`0991f7e8 00000000`0000001d : KERNELBASE!OutputDebugStringA+0x6d
00000000`0991f730 000007fe`fce1208e : 00000000`0bad1280 00000000`0991f990 00000000`0991f7e8 00000000`00000208 : KERNELBASE!OutputDebugStringW+0x66
00000000`0991f780 000007fe`fce12193 : 000007fe`fce45da0 00000000`c0000034 00000000`0991f7d8 00000000`0991f990 : RCD_AppInit+0x208e
00000000`0991f7e0 000007fe`fce12358 : 00000000`00000000 00000000`0991fce0 00000000`00000104 00000000`09920100 : RCD_AppInit+0x2193
00000000`0991f890 000007fe`fce12744 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`09920438 : RCD_AppInit+0x2358
00000000`0991ff20 00000000`772790e9 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000030 : RCD_AppInit+0x2744
00000000`099203a0 000007fe`f4db1851 : 00000000`774a7ff0 00000000`7748aa00 00000000`09077b30 00000000`00000000 : kernel32!SetUnhandledExceptionFilter+0xc9
00000000`09920670 00000000`773c4918 : 00000000`00000000 00000000`f3ae8b8a 00000000`774a7ff0 00000000`00000000 : baiducn+0x1851
00000000`099206a0 00000000`773c0002 : 00000000`0000000e 00000000`00000000 00000000`00000000 00000000`09077b20 : ntdll!RtlpCallVectoredHandlers+0xa8
00000000`09920710 00000000`773c3c3f : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlDispatchException+0x22
00000000`09920df0 000007fe`fd05bded : 00000000`00000000 00000000`09921bf8 00000000`0ba9536e 00000000`00000000 : ntdll!RtlRaiseException+0x22f
00000000`099217a0 000007fe`fd06802d : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : KERNELBASE!RaiseException+0x39
00000000`09921870 000007fe`fd08ea06 : 00000000`09921b60 00000000`00000000 00000000`09921bf8 00000000`0000001d : KERNELBASE!OutputDebugStringA+0x6d

同樣是c0000005異常,在往0000000009910d48寫數據的時候訪問違例。代碼發生在ntdll!RtlDispatchException+0x1d。當然Windows自己的bug還是比較少的,不像是ntdll的bug。看看別的疑點。

從崩潰處的棧回溯來看,似乎若干棧幀似乎是在不停循環,像是棧溢出了,即棧不停的增長,最後試圖增長到9910d48時,由於這個地址不可寫,或者不屬於棧區,因此崩潰了。用!address看看該進程內存分佈

+        0`098d0000        0`09910000        0`00040000             MEM_FREE    PAGE_NOACCESS                      Free       
+        0`09910000        0`09911000        0`00001000 MEM_PRIVATE MEM_RESERVE                                    Stack      [~35; 73c.d2c]
         0`09911000        0`09990000        0`0007f000 MEM_PRIVATE MEM_COMMIT  PAGE_READWRITE                     Stack      [~35; 73c.d2c]

9910d48雖然在棧區,但是似乎沒有可讀寫的屬性。表明上原因找到了。

繼續看爲什麼棧幀在循環。KERNELBASE!RaiseException表明此處拋出了異常,kernel32!SetUnhandledExceptionFilter表明windows程序框架會去SetUnhandledExceptionFilter裏尋找異常接受者。KERNELBASE!OutputDebugStringA內部實現裏其實就是拋出異常0x40010006。但是一般這個異常是被忽略的。猜測這個異常被baiducn.dll給接管了,但是它沒有處理好。

至於appinit爲什麼要調用OutputDebugString,這也是一個問題。畢竟發佈版本,也沒人接受輸出日誌。

4.3.2 異常

4.3.2.1 demo

寫一個無人接受c++異常而崩潰的demo

#define WIN32_LEAN_AND_MEAN
#include <Windows.h>
#include <process.h>
void func1(int i) {	throw i; }
void func2(int i) {	func1(i); }
unsigned __stdcall func3(void *) {
	int i = 9;
	func2(i);
	return 0;
}
int main() {
	_beginthreadex(nullptr, 0, func3, nullptr, 0, null);
	Sleep(10000);
	return 0;
}

windbg打開dmp,輸入!analyze -v自動分析

CONTEXT:  013df940 -- (.cxr 0x13df940)
eax=013dfe20 ebx=19930520 ecx=00000003 edx=00000000 esi=006510c0 edi=006525d4
eip=75864402 esp=013dfe20 ebp=013dfe7c iopl=0         nv up ei pl nz ac po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000212
KERNELBASE!RaiseException+0x62:
75864402 8b4c2454        mov     ecx,dword ptr [esp+54h] ss:002b:013dfe74=bcc88d5b
Resetting default scope

EXCEPTION_RECORD:  013df8f0 -- (.exr 0x13df8f0)
ExceptionAddress: 75864402 (KERNELBASE!RaiseException+0x00000062)
   ExceptionCode: e06d7363 (C++ EH exception)
  ExceptionFlags: 00000001
NumberParameters: 3
   Parameter[0]: 19930520
   Parameter[1]: 013dfebc
   Parameter[2]: 006525d4
  pExceptionObject: 013dfebc
  _s_ThrowInfo    : 006525d4
  Type            : int

PROCESS_NAME:  throwexcept.exe

ERROR_CODE: (NTSTATUS) 0xc0000409 - <Unable to get error code text>

EXCEPTION_CODE_STR:  c0000409

EXCEPTION_PARAMETER1:  00000007

STACK_TEXT:  
013dfe7c 748d7a46 e06d7363 00000001 00000003 KERNELBASE!RaiseException+0x62
WARNING: Stack unwind information not available. Following frames may be wrong.
013dfeac 00651098 013dfebc 006525d4 00000009 VCRUNTIME140!CxxThrowException+0x66
013dfec0 006510ac 00000009 013dfedc 006510e1 throwexcept!func1+0x18
013dfecc 006510e1 00000009 00000009 013dff14 throwexcept!func2+0xc
013dfedc 7556248f 00000000 8f07697f 75562450 throwexcept!func3+0x21
013dff14 76166359 010e3a80 76166340 013dff80 ucrtbase!thread_start<unsigned int (__stdcall*)(void *),1>+0x3f
013dff24 77417c24 010e3a80 57ab392c 00000000 kernel32!BaseThreadInitThunk+0x19
013dff80 77417bf4 ffffffff 77438fd1 00000000 ntdll!__RtlUserThreadStart+0x2f
013dff90 00000000 75562450 010e3a80 00000000 ntdll!_RtlUserThreadStart+0x1b


SYMBOL_NAME:  vcruntime140!CxxThrowException+66

MODULE_NAME: VCRUNTIME140

IMAGE_NAME:  VCRUNTIME140.dll

STACK_COMMAND:  .cxr 0x13df940 ; kb

FAILURE_BUCKET_ID:  FAIL_FAST_FATAL_APP_EXIT_c0000409_VCRUNTIME140.dll!CxxThrowException

4.4. 死鎖導致的windows卡死

故障現象

虛機剛開機展示桌面過程中卡死,Ctrl+alt+del無反應。一般持續30-60min後恢復。非必現,反覆重啓復現頻率大約1/30。

分析

無法呼出任務管理器,當作Windows卡死處理。Ctrl+ScrollLock藍屏,或直接Windows內核調試觀測。

Windows內核調試時,輸入running觀察每個cpu正在跑的線程,看看是否有某些線程長期佔用cpu。結果並無。

輸入!locks分析死鎖。大約半小時後分析得到:

2: kd> !locks
**** DUMP OF ALL RESOURCE OBJECTS ****
KD: Scanning for held locks...

Resource @ 0xfffff8800128efa8    Exclusively owned
    Contention Count = 2
     Threads: fffffa800c776660-01<*> 
KD: Scanning for held locks.........................

Resource @ 0xfffffa800e52c880    Exclusively owned
    Contention Count = 67
    NumberOfSharedWaiters = 13
     Threads: fffffa800f09e540-02<*> fffffa800c89e340-01    fffffa800d6b2b50-01    fffffa800cc76b50-01    
              fffffa800c8a5060-01    fffffa800cbd0060-01    fffffa800ee2b060-01    fffffa800ec13650-01    
              fffffa800f241930-01    fffffa800efa3640-01    fffffa800f1fd060-01    fffffa800f71e060-01    
              fffffa800f515a20-01    fffffa800c776660-01    
KD: Scanning for held locks...........................................................................................................

Resource @ 0xfffffa800f175050    Exclusively owned
    Contention Count = 8
     Threads: fffffa800f09e540-01<*> 
KD: Scanning for held locks..........

Resource @ 0xfffffa800f22d2a8    Exclusively owned
     Threads: fffffa800f09e540-01<*> 
KD: Scanning for held locks.....................................................................................................................
8342 total locks, 4 locks currently held

這裏主要觀察Exclusively Resource。其中Resource 0xfffffa800e52c880甚爲異常,一個線程持有,13個線程等待。用!locks -v [addr]查看該資源和14個相關線程的明細。

2: kd> !locks -v 0xfffffa800e52c880    

Resource @ 0xfffffa800e52c880    Exclusively owned
    Contention Count = 67
    NumberOfSharedWaiters = 13
     Threads: fffffa800f09e540-02<*> 

     THREAD fffffa800f09e540  Cid 0cd8.0cf0  Teb: 000007fffffd9000 Win32Thread: 0000000000000000 WAIT: (Executive) KernelMode Alertable
         fffffa800f703fd8  NotificationEvent
         fffffa800f703f98  Semaphore Limit 0x7fffffff
     IRP List:
         fffffa800ecf1c60: (0006,03a0) Flags: 00060043  Mdl: fffffa800cc7f440
         fffffa800cbbea20: (0006,03a0) Flags: 00060901  Mdl: 00000000
     Not impersonating
     DeviceMap                 fffff8a000008b30
     Owning Process            fffffa800f5ef060       Image:         vds.exe
     Attached Process          N/A            Image:         N/A
     Wait Start TickCount      167828         Ticks: 3611 (0:00:00:56.421)
     Context Switch Count      234            IdealProcessor: 0             
     UserTime                  00:00:00.000
     KernelTime                00:00:00.015
     Win32 Start Address 0x000000007700f6f0
     Stack Init fffff8800526bfb0 Current fffff8800526b510
     Base fffff8800526c000 Limit fffff88005266000 Call 0000000000000000
     Priority 14 BasePriority 8 PriorityDecrement 80 IoPriority 2 PagePriority 5
     Child-SP          RetAddr           Call Site
     fffff880`0526b550 fffff800`04ec1772 nt!KiSwapContext+0x7a
     fffff880`0526b690 fffff800`04ec0c8a nt!KiCommitThreadWait+0x1d2
     fffff880`0526b720 fffff800`051583ec nt!KeWaitForMultipleObjects+0x272
     fffff880`0526b9e0 fffff880`00c0dde5 nt!FsRtlCancellableWaitForMultipleObjects+0xac
     fffff880`0526ba40 00000000`00000000 0xfffff880`00c0dde5

fffffa800c89e340-01    

     THREAD fffffa800c89e340  Cid 05c8.053c  Teb: 000000007eee9000 Win32Thread: 0000000000000000 WAIT: (WrResource) KernelMode Non-Alertable
         fffffa800e499840  Semaphore Limit 0x7fffffff
     IRP List:
         fffffa800c8ab010: (0006,03a0) Flags: 00060000  Mdl: 00000000
         fffffa800c8aaa30: (0006,03a0) Flags: 00060000  Mdl: 00000000
         fffffa800c8a8010: (0006,03a0) Flags: 00060000  Mdl: 00000000
         fffffa800c8a4340: (0006,03a0) Flags: 00060000  Mdl: 00000000
     Not impersonating
     DeviceMap                 fffff8a000d516e0
     Owning Process            fffffa800f04cb10       Image:         360EntClient.exe
...
...

持有線程fffffa800f09e540屬於進程vds.exe。它有兩個irp。分析其中一個!irp [addr]

2: kd> !irp fffffa800cbbea20
Irp is active with 9 stacks 9 is current (= 0xfffffa800cbbed30)
 No Mdl: No System Buffer: Thread fffffa800f09e540:  Irp stack trace.  
     cmd  flg cl Device   File     Completion-Context
 [N/A(0), N/A(0)]
            0  0 00000000 00000000 00000000-00000000    

			Args: 00000000 00000000 00000000 00000000
 [N/A(0), N/A(0)]
            0  0 00000000 00000000 00000000-00000000    

			Args: 00000000 00000000 00000000 00000000
 [N/A(0), N/A(0)]
            0  0 00000000 00000000 00000000-00000000    

			Args: 00000000 00000000 00000000 00000000
 [N/A(0), N/A(0)]
            0  0 00000000 00000000 00000000-00000000    

			Args: 00000000 00000000 00000000 00000000
 [N/A(0), N/A(0)]
            0  0 00000000 00000000 00000000-00000000    

			Args: 00000000 00000000 00000000 00000000
 [N/A(0), N/A(0)]
            0  0 00000000 00000000 00000000-00000000    

			Args: 00000000 00000000 00000000 00000000
 [N/A(0), N/A(0)]
            0  0 00000000 00000000 00000000-00000000    

			Args: 00000000 00000000 00000000 00000000
 [N/A(0), N/A(0)]
            0  0 00000000 00000000 00000000-00000000    

			Args: 00000000 00000000 00000000 00000000
>[IRP_MJ_READ(3), N/A(0)]
            0  0 fffffa800e52c030 fffffa800f59c070 00000000-00000000    
	       \FileSystem\Ntfs
			Args: 00000200 00000000 00000000 00000000

未果。

終止vds.exe.kill fffffa800f5ef060即可立刻恢復正常。這裏確實不知vds在等待什麼,亦或vds自己的bug,亦或別人的程序不釋放資源,使得vds卡死。

vds.exe即Windows服務Virtual Disk。它因爲Guesttool的某個diskpart調用而啓動。

4.5. 疑似內存佔用高導致的Windows卡死

Windows管理內存原理待研究……。以下試舉一例

故障現象

虛機Windows7不定期卡死。難手工復現,但3-5天總會出現一例。

分析

故障時,host機查看qemu進程cpu消耗並不高。不像是內核層的死循環問題。windbg分析dump文件,!locks也沒有Exclusively Resource。查看內存時,發現佔用特別高。

先用!memusage查看物理內存的使用和各個進程的消耗。

1: kd> !memusage
 loading PFN database
loading (100% complete)
Compiling memory usage data (99% Complete).
             Zeroed:        9 (      36 kb)
               Free:        2 (       8 kb)
            Standby:     4437 (   17748 kb)
           Modified:      839 (    3356 kb)
    ModifiedNoWrite:      135 (     540 kb)
       Active/Valid:   752253 ( 3009012 kb)
         Transition:    17266 (   69064 kb)
         SLIST/Temp:        0 (       0 kb)
                Bad:        0 (       0 kb)
            Unknown:        0 (       0 kb)
              TOTAL:   774941 ( 3099764 kb)

Dangling Yes Commit:    17246 (   68984 kb)
 Dangling No Commit:   638353 ( 2553412 kb)
  Building kernel map
  Finished building kernel map
Scanning PFN database - (100% complete) 

  Usage Summary (in Kb):
Control       Valid Standby Dirty Shared Locked PageTables  name
ffffffffffffd  1644      0     0     0  1644     0    AWE
fffffa80023d9b80     0     52     0     0     0     0  mapped_file( Can't read file name buffer at fffff8a007375ec0 )
...
   6762c     4      0     0     0     4     0   Page File Section
--------   204    128     0 ----- -----   128  session 0 fffff88005745000
--------  3696      0     0 ----- -----   260  session 1 fffff8800576d000
--------    24      0     0 ----- -----    24  process ( System ) fffffa8002445040
--------    32     36     0 ----- -----    68  process ( csrss.exe ) fffffa800244f060
--------    28     28     0 ----- -----    48  process ( wininit.exe ) fffffa80024545c0
--------     4      0     0 ----- -----     4  process ( iexplore.exe ) fffffa8002625aa0
--------     4      0     0 ----- -----     4  process ( iexplore.exe ) fffffa8002a4f060
--------    20      0     0 ----- -----    20  process ( LogonUI.exe ) fffffa8002c207b0
--------     4      0     0 ----- -----     4  process ( QQExternal.exe ) fffffa8002ff5060
--------     4      0     0 ----- -----     4  process ( QQExternal.exe ) fffffa80030a7b00
--------    32      0     0 ----- -----    32  process ( LogonUI.exe ) fffffa80031691f0
...

這裏物理內存僅剩8kb。再用!poolused查看內存池的佔用,

1: kd> !poolused 4
....
 Sorting by Paged Pool Consumed

               NonPaged                  Paged
 Tag     Allocs         Used     Allocs         Used

 fpRD         0            0      21339   3233968128	Compressed file large read buffer , Binary: wof.sys
 fprd         0            0      65114   2133655552	Compressed file small read buffer , Binary: wof.sys
 CM31         0            0      20994    119910400	Internal Configuration manager allocations , Binary: nt!cm
 CM25         0            0       3369     15839232	Internal Configuration manager allocations , Binary: nt!cm
...
1: kd> !poolused 2
....
 Sorting by NonPaged Pool Consumed

               NonPaged                  Paged
 Tag     Allocs         Used     Allocs         Used

 FMic     34737     53344128          0            0	IRP_CTRL structure , Binary: fltmgr.sys
 Irp      35838     33702832          0            0	Io, IRP packets 
 Cont      2397     10666272          0            0	Contiguous physical memory allocations for device drivers 
 Mdl      18614      6753040          0            0	Io, Mdls 
 MmIn     17255      6625920          0            0	Mm inpaged io structures , Binary: nt!mm
 EtwB       101      4259840          2       131072	Etw Buffer , Binary: nt!etw
 fpcx     17248      4139520          0            0	Compressed file IO context , Binary: wof.sys
 Pool         5      3315280          0            0	Pool tables, etc. 
 ...

可以看到無論時分頁還是非分頁的池內存,wof.sys佔用量都是非常靠前的。

重做鏡像,不再引入wof.sys後不再出現此問題。

4.6. 藍屏

藍屏的關鍵信息一般是一個錯誤碼加上4個參數,我們照Windows文檔指南進行排查。

以下舉一例

故障現象

編輯鏡像原正常,一旦安裝radmin軟件,重啓後則藍屏。藍屏頻率90%。

分析

與radmin有關。推測是guesttool安裝的驅動和radmin衝突有關。windbg打開dmp後,輸入!analyze -v分析,亦可直接用’.bugcheck’獲知藍屏的錯誤碼和4個參數。

2: kd> .bugcheck
Bugcheck code 0000001E
Arguments ffffffff`c0000096 fffff804`307c9bbe 00000000`00000000 00000000`00000000

windbg裏按F1打開幫助文檔,索引裏搜索Bug Check 0x1E

Bug Check 0x1E: KMODE_EXCEPTION_NOT_HANDLED

The KMODE_EXCEPTION_NOT_HANDLED bug check has a value of 0x0000001E. This indicates that a kernel-mode program generated an exception which the error handler did not catch.

Important This topic is for programmers. If you are a customer who has received a blue screen error code while using your computer, see Troubleshoot blue screen errors.

KMODE_EXCEPTION_NOT_HANDLED Parameters

The following parameters are displayed on the blue screen.

Parameter	Description
1			The exception code that was not handled
2			The address at which the exception occurred
3			Parameter 0 of the exception
4			Parameter 1 of the exception

主要看第二參數,即異常發生的地址。反彙編該地址

2: kd> u fffff804`307c9bbe
nt!KiSaveDebugRegisterState+0x8e:
fffff804`307c9bbe 0f32	rdmsr
...

rdmsr是個很簡單的寄存器內數值搬運的動作。藍屏的原因是因爲虛擬化不支持msr寄存器。

4.7. 逆向api的實際入參

64位程序的函數傳參,從左至右分別通過rcx,rdx,r8,r9,rsp+20h,rsp+28h…來傳參。32位程序較複雜,有stdcall,cdecl,fastcall等。我們用windbg斷點時,能分析出當時所調用的實際參數。以下舉一例。

通過process monitor監控註冊表等手段,猜測某項功能最終調用的是SetDisplayConfig,但是涉及相關概念複雜,API文檔不易讀懂。

LONG SetDisplayConfig(
  UINT32                  numPathArrayElements,
  DISPLAYCONFIG_PATH_INFO *pathArray,
  UINT32                  numModeInfoArrayElements,
  DISPLAYCONFIG_MODE_INFO *modeInfoArray,
  UINT32                  flags
);

而且指針數組所指向的多個結構體變量也不易逆向分析。API MONITOR也沒有記錄此api(api monitor其實可以添加此api)。

windbg菜單處點擊file->attach to process,選擇目標程序,給SetDisplayConfig下斷點。最好帶上它所屬的動態庫名

bp User32!SetDisplayConfig。函數名大小寫嚴格,動態庫名無所謂。

bl羅列斷點,e表示斷點有效。bd命令來disable斷點。

0:016> bl
0 e Disable Clear  00007ff8`c7413400     0001 (0001)  0:**** USER32!SetDisplayConfig

而後運行目標程序,直到斷點觸發。由於這是64位程序,所以直接用r命令看寄存器即可知前4個參數的值。

0:000> r
rax=0000000000000000 rbx=0000005a7ebfd450 rcx=0000000000000001
rdx=000001cd7eec1e60 rsi=000001cd7a16de10 rdi=000001cd7a16de20
rip=00007ff8c7413400 rsp=0000005a7ebfba18 rbp=0000005a7ebfbb20
 r8=0000000000000003  r9=000001cd78f6add0 r10=0000000000000000
r11=0000005a7ebfb7c0 r12=0000005a7ebfcfa4 r13=0000000000000001
r14=0000005a7ebfd450 r15=000001cd7a16dda0
...

參數2和4是結構體數組。以第4個參數爲例,結構體的數量爲r8即3,三個結構體的地址是分別是1cd78f6add0,1cd78f6add8,1cd78f6ade0。

得知結構體指針後,用dt命令參看結構體成員。由於事先不知道此結構體DISPLAYCONFIG_MODE_INFO的符號屬於哪個pdb,事先需要用.reload /f載入所有的pdb。再用dt命令參看。

0:000> dt wintypes!DISPLAYCONFIG_MODE_INFO 000001cd`78f6ade0
   +0x000 infoType         : 0x8d7aaa0 (No matching name)
   +0x004 id               : 0
   +0x008 adapterId        : _LUID
   +0x010 targetMode       : DISPLAYCONFIG_TARGET_MODE
   +0x010 sourceMode       : DISPLAYCONFIG_SOURCE_MODE
   +0x010 desktopImageInfo : DISPLAYCONFIG_DESKTOP_IMAGE_INFO

windbg preview裏可以直接點擊,查看具體的成員。例如點擊adpterId可以直接查看adapterId字段的具體成員

0:000> dx -r1 (*((wintypes!_LUID *)0x1cd78f6ade8))
(*((wintypes!_LUID *)0x1cd78f6ade8))                 [Type: _LUID]
    [+0x000] LowPart          : 0x107ac [Type: unsigned long]
    [+0x004] HighPart         : 1 [Type: long]

參數5在棧上,一般是當前幀的RetAddr位置+28h。使用dps rsp查看棧上數據

0:000> dps rsp
0000005a`7ebfba18  00007ff8`9e66c313 igfxDI!DllUnregisterServer+0x17af3
0000005a`7ebfba20  0000005a`7ebfd450
0000005a`7ebfba28  0000005a`7ebfbb20
0000005a`7ebfba30  000001cd`7a16de10
0000005a`7ebfba38  000001cd`7a16dda0
0000005a`7ebfba40  0000005a`000082a0
0000005a`7ebfba48  0000005a`7ebfba01
...

即參數5存儲在0000005a7ebfba40上。由於參數5是UINT32類型,故而使用dd來查看

0:000> dd 0000005a`7ebfba40
0000005a`7ebfba40  000082a0 0000005a 7ebfba01 0000005a
...

參數5爲0x82a0。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章