Share the experience of how to debug dead lock

分享一下最近解的一個死鎖的問題。

當看到這樣一次Crash時,你該怎麼去處理呢?相信很多時候,大家都束手無策。怎麼辦呢?

01-01 17:05:10.278 2022 3722 D PumpWatchDog: Run
01-01 17:05:10.279 2022 3722 D PumpWatchDog: BOG BITE:tid 3723
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #00 pc 00017857 /system/lib/libc.so (__futex_wait_ex+42)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #01 pc00017bef /system/lib/libc.so (pthread_mutex_lock+310)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #02 pc 0012ba9f /system/lib/libtvserver.so(TvOADControl::unregisterListenerCallback(android::sp)+20)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #03 pc0012af33 /system/lib/libtvserver.so (BnTvOADControl::onTransact(unsignedint, android::Parcel const&, android::Parcel*, unsigned int)+682)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #04 pc0001a6cd /system/lib/libbinder.so (android::BBinder::transact(unsignedint, android::Parcel const&, android::Parcel*, unsigned int)+60)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #05 pc0001f7a3 /system/lib/libbinder.so(android::IPCThreadState::executeCommand(int)+582)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #06 pc0001fa95 /system/lib/libbinder.so(android::IPCThreadState::waitForResponse(android::Parcel*, int*)+252)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #07 pc0001fb65 /system/lib/libbinder.so (android::IPCThreadState::transact(int,unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+124)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #08 pc0001ad23 /system/lib/libbinder.so (android::BpBinder::transact(unsignedint, android::Parcel const&, android::Parcel*, unsigned int)+30)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #09 pc0012a4dd /system/lib/libtvserver.so (BpTvOADControl::BponOADControlChangeListener::OnDownloadStateChanged(int)+64)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #10 pc 0012b0ad /system/lib/libtvserver.so(TvOADControl::OnDownloadStateChanged()+38)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #11 pc001efd5b /system/lib/libtvserver.so(CCbmhgOverAirDownload_m_Priv::callistoN_OnDownloadStateChanged()+14)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #12 pc00156a39 /system/lib/libtvserver.so
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #13 pc001ed8db /system/lib/libtvserver.so(CCbmhgOverAirDownload_mcallisto_Priv::evtN_OnEvent(int)+106)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #14 pc001ee1b3 /system/lib/libtvserver.so(CCbmhgOverAirDownload_mcallisto_Priv::mMhegCallistoPumpHandler(int, unsignedint)+622)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #15 pc0013afd9 /system/lib/libtvserver.so
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #16 pc00038303 /system/lib/libTvMiddlewareCore.so
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #17 pc00012623 /system/lib/libutils.so (android::Looper::pollInner(int)+410)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #18 pc00012715 /system/lib/libutils.so (android::Looper::pollOnce(int, int*,int*, void**)+92)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #19 pc00038887 /system/lib/libTvMiddlewareCore.so
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #20 pc00016feb /system/lib/libc.so (__pthread_start(void*)+30)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #21 pc00014f33 /system/lib/libc.so (__start_thread+6)

  先看第二行,發生DOG BITE,這個是tvserver內部看門狗機制,當系統長時間沒有響應時,就發生餵狗動作。
  01-01 17:05:10.278  2022  3722 D PumpWatchDog: Run

01-0117:05:10.279 2022 3722 D PumpWatchDog: BOG BITE: tid 3723
即在watchdog想系統發送singal-fatal[6] 這個信號到linux系統,linux系統接收到這個信號後,就會重啓。

  《這個機制本身無可厚非,千萬不要因爲Crash發生在WatchDog內部,而註釋掉abort() 函數,這樣看門狗機制就失效啦》

  再來看看Crash 堆棧,先看着三行,從1,2行可以看出是因爲get那把鎖沒有get到,不要去懷疑這個函數有問題。

那個可以認爲這個crash的棧頂就是unregisterListenerCallback。
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #00 pc 00017857 /system/lib/libc.so (__futex_wait_ex+42)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #01 pc00017bef /system/lib/libc.so (pthread_mutex_lock+310)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #02 pc 0012ba9f /system/lib/libtvserver.so(TvOADControl::unregisterListenerCallback(android::sp)+20)

   接着看剩餘的堆棧,其實像libbinder.so,libutils.so, libc.so這樣的一些函數,不可能會出錯,就要儘可能分析我們自己的code。

01-01 17:05:10.300 2022 3722 E PumpWatchDog: #03 pc0012af33 /system/lib/libtvserver.so (BnTvOADControl::onTransact(unsignedint, android::Parcel const&, android::Parcel*, unsigned int)+682)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #04 pc0001a6cd /system/lib/libbinder.so (android::BBinder::transact(unsignedint, android::Parcel const&, android::Parcel*, unsigned int)+60)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #05 pc0001f7a3 /system/lib/libbinder.so(android::IPCThreadState::executeCommand(int)+582)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #06 pc0001fa95 /system/lib/libbinder.so(android::IPCThreadState::waitForResponse(android::Parcel*, int*)+252)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #07 pc0001fb65 /system/lib/libbinder.so (android::IPCThreadState::transact(int,unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+124)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #08 pc0001ad23 /system/lib/libbinder.so (android::BpBinder::transact(unsignedint, android::Parcel const&, android::Parcel*, unsigned int)+30)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #09 pc0012a4dd /system/lib/libtvserver.so(BpTvOADControl::BponOADControlChangeListener::OnDownloadStateChanged(int)+64)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #10 pc 0012b0ad /system/lib/libtvserver.so(TvOADControl::OnDownloadStateChanged()+38)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #11 pc001efd5b /system/lib/libtvserver.so(CCbmhgOverAirDownload_m_Priv::callistoN_OnDownloadStateChanged()+14)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #12 pc00156a39 /system/lib/libtvserver.so
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #13 pc001ed8db /system/lib/libtvserver.so(CCbmhgOverAirDownload_mcallisto_Priv::evtN_OnEvent(int)+106)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #14 pc001ee1b3 /system/lib/libtvserver.so(CCbmhgOverAirDownload_mcallisto_Priv::mMhegCallistoPumpHandler(int, unsignedint)+622)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #15 pc0013afd9 /system/lib/libtvserver.so
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #16 pc00038303 /system/lib/libTvMiddlewareCore.so
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #17 pc00012623 /system/lib/libutils.so (android::Looper::pollInner(int)+410)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #18 pc00012715 /system/lib/libutils.so (android::Looper::pollOnce(int, int*,int*, void**)+92)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #19 pc00038887 /system/lib/libTvMiddlewareCore.so
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #20 pc00016feb /system/lib/libc.so (__pthread_start(void*)+30)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #21 pc00014f33 /system/lib/libc.so (__start_thread+6)
結合Crash堆棧和code,一起發現,大致的流程是這樣:TvServer向上通知download狀態發生改變,TvOADControl::OnDownloadStateChanged(),而函數棧頂又是unregisterListenerCallback,這樣可以把問題點縮小。

  綜上,結合crash發生的場景,堆棧,來跟code,很快就會發現問題。
  雖然Crash的是在TvServer,但是導致的原因卻是在上層。
  這個例子應該是死鎖的一個典型應用場景:在一個Callback裏,去call另外一個函數,而另外一個函數裏又依賴於這把鎖。還有就是不要再Callback裏,做太多事

這裏寫圖片描述

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章