watchdog就是看門狗。以前實習公司的watchdog就是監視進程,如果進程掛了就重新啓動進程。
在Android中watchdog的原理也類似,通過向進程發送消息,判斷返回值延遲時間,若超時,通知zogte自殺,後面init會重啓zogte,所以重啓的是android,不影響kernel,速度較快。
盜個圖:
開始擼代碼:
1.啓動在systemserver:
final Watchdog watchdog = Watchdog.getInstance();
watchdog.init(context, mActivityManagerService);
Watchdog.getInstance().start();
2.getInstance是單例模式,就是調用watchdog的構造
250 private Watchdog() { 251 super("watchdog"); 252 // Initialize handler checkers for each common thread we want to check. Note 253 // that we are not currently checking the background thread, since it can 254 // potentially hold longer running operations with no guarantees about the timeliness 255 // of operations there. 256 257 // The shared foreground thread is the main checker. It is where we 258 // will also dispatch monitor checks and do other work. 259 mMonitorChecker = new HandlerChecker(FgThread.getHandler(), 260 "foreground thread", DEFAULT_TIMEOUT); 261 mHandlerCheckers.add(mMonitorChecker); 262 // Add checker for main thread. We only do a quick check since there 263 // can be UI running on the thread. 264 mHandlerCheckers.add(new HandlerChecker(new Handler(Looper.getMainLooper()), 265 "main thread", DEFAULT_TIMEOUT)); 266 // Add checker for shared UI thread. 267 mHandlerCheckers.add(new HandlerChecker(UiThread.getHandler(), 268 "ui thread", DEFAULT_TIMEOUT)); 269 // And also check IO thread. 270 mHandlerCheckers.add(new HandlerChecker(IoThread.getHandler(), 271 "i/o thread", DEFAULT_TIMEOUT)); 272 // And the display thread. 273 mHandlerCheckers.add(new HandlerChecker(DisplayThread.getHandler(), 274 "display thread", DEFAULT_TIMEOUT)); 275 276 // Initialize monitor for Binder threads. 277 addMonitor(new BinderThreadMonitor()); 278 279 mOpenFdMonitor = OpenFdMonitor.create(); 280 281 // See the notes on DEFAULT_TIMEOUT. 282 assert DB || 283 DEFAULT_TIMEOUT > ZygoteConnectionConstants.WRAPPED_PID_TIMEOUT_MILLIS; 284 }
在Watchdog構造函數中將main thread,UIthread,Iothread,DisplayThread加入mHandlerCheckers列表中。最後初始化monitor放入mMonitorCheckers列表中 ,還有binder和fd的monitor
3.watchdog監控
Watchdog提供兩種監視方式,一種是通過monitor()回調監視服務關鍵區是否出現死鎖或阻塞,一種是通過發送消息監視服務主線程是否阻塞。比如服務ams(monitor),跑在systemserver(發送消息)上。
addMonitor()
addThread()
monitor監控服務是通過服務實現watchdog的monitor接口,主動實現的。
發生watchdog時,會打印watchdog重啓時有有兩種提示語:“Block in Handler in ......”和“Block in monitor”,它們分別對應不同的阻塞類型
4.watchdog工作
watchdog是個thread,start就是調用run,看run函數,比較長
首先是進入無限循環,調用
scheduleCheckLocked();進行監控
進入這個函數裏面:
1.如果monitor空,或者線程正在發消息,直接返回true,此時不可能有阻塞
2.mComplete爲false,代表正在進行監控
3.若都不滿足,則postAtFrontOfQueue(this),進行檢查
@Override 200 public void run() { 201 final int size = mMonitors.size(); 202 for (int i = 0 ; i < size ; i++) { 203 synchronized (Watchdog.this) { 204 mCurrentMonitor = mMonitors.get(i); 205 } 206 mCurrentMonitor.monitor(); 207 } 208 209 synchronized (Watchdog.this) { 210 mCompleted = true; 211 mCurrentMonitor = null; 212 } 213 }
4.報異常邏輯
在每個監測過程中,調用evaluateCheckerCompletionLocked進行返回時間計算
complete就是沒有阻塞
waitting狀態就是時間在0~30,繼續等待
waited_half狀態實在30~59 時間過半,開始dump ams stacktrace
到60秒,就是有阻塞發生了
獲取阻塞的服務和線程,生成log和dropbox
最後開殺
Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject); 563 WatchdogDiagnostics.diagnoseCheckers(blockedCheckers); 564 Slog.w(TAG, "*** GOODBYE!"); 565 Process.killProcess(Process.myPid()); 566 System.exit(10);
5.接收廣播重啓
在init()函數中,接下來會調用registerReceiver()來註冊系統重啓的BroadcastReceiver。在收到系統重啓廣播時會執行RebootRequestReceiver的onReceive()函數,繼而調用rebootSystem()重啓系統。它允許其它模塊(如CTS)通過發廣播來讓系統重啓。所以watchdog有一個重要的工作,就是接收廣播並重啓系統。
盜了張圖: