Share the experience of how to debug dead lock

分享一下最近解的一个死锁的问题。

当看到这样一次Crash时,你该怎么去处理呢?相信很多时候,大家都束手无策。怎么办呢?

01-01 17:05:10.278 2022 3722 D PumpWatchDog: Run
01-01 17:05:10.279 2022 3722 D PumpWatchDog: BOG BITE:tid 3723
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #00 pc 00017857 /system/lib/libc.so (__futex_wait_ex+42)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #01 pc00017bef /system/lib/libc.so (pthread_mutex_lock+310)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #02 pc 0012ba9f /system/lib/libtvserver.so(TvOADControl::unregisterListenerCallback(android::sp)+20)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #03 pc0012af33 /system/lib/libtvserver.so (BnTvOADControl::onTransact(unsignedint, android::Parcel const&, android::Parcel*, unsigned int)+682)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #04 pc0001a6cd /system/lib/libbinder.so (android::BBinder::transact(unsignedint, android::Parcel const&, android::Parcel*, unsigned int)+60)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #05 pc0001f7a3 /system/lib/libbinder.so(android::IPCThreadState::executeCommand(int)+582)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #06 pc0001fa95 /system/lib/libbinder.so(android::IPCThreadState::waitForResponse(android::Parcel*, int*)+252)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #07 pc0001fb65 /system/lib/libbinder.so (android::IPCThreadState::transact(int,unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+124)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #08 pc0001ad23 /system/lib/libbinder.so (android::BpBinder::transact(unsignedint, android::Parcel const&, android::Parcel*, unsigned int)+30)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #09 pc0012a4dd /system/lib/libtvserver.so (BpTvOADControl::BponOADControlChangeListener::OnDownloadStateChanged(int)+64)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #10 pc 0012b0ad /system/lib/libtvserver.so(TvOADControl::OnDownloadStateChanged()+38)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #11 pc001efd5b /system/lib/libtvserver.so(CCbmhgOverAirDownload_m_Priv::callistoN_OnDownloadStateChanged()+14)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #12 pc00156a39 /system/lib/libtvserver.so
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #13 pc001ed8db /system/lib/libtvserver.so(CCbmhgOverAirDownload_mcallisto_Priv::evtN_OnEvent(int)+106)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #14 pc001ee1b3 /system/lib/libtvserver.so(CCbmhgOverAirDownload_mcallisto_Priv::mMhegCallistoPumpHandler(int, unsignedint)+622)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #15 pc0013afd9 /system/lib/libtvserver.so
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #16 pc00038303 /system/lib/libTvMiddlewareCore.so
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #17 pc00012623 /system/lib/libutils.so (android::Looper::pollInner(int)+410)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #18 pc00012715 /system/lib/libutils.so (android::Looper::pollOnce(int, int*,int*, void**)+92)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #19 pc00038887 /system/lib/libTvMiddlewareCore.so
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #20 pc00016feb /system/lib/libc.so (__pthread_start(void*)+30)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #21 pc00014f33 /system/lib/libc.so (__start_thread+6)

  先看第二行,发生DOG BITE,这个是tvserver内部看门狗机制,当系统长时间没有响应时,就发生喂狗动作。
  01-01 17:05:10.278  2022  3722 D PumpWatchDog: Run

01-0117:05:10.279 2022 3722 D PumpWatchDog: BOG BITE: tid 3723
即在watchdog想系统发送singal-fatal[6] 这个信号到linux系统,linux系统接收到这个信号后,就会重启。

  《这个机制本身无可厚非,千万不要因为Crash发生在WatchDog内部,而注释掉abort() 函数,这样看门狗机制就失效啦》

  再来看看Crash 堆栈,先看着三行,从1,2行可以看出是因为get那把锁没有get到,不要去怀疑这个函数有问题。

那个可以认为这个crash的栈顶就是unregisterListenerCallback。
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #00 pc 00017857 /system/lib/libc.so (__futex_wait_ex+42)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #01 pc00017bef /system/lib/libc.so (pthread_mutex_lock+310)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #02 pc 0012ba9f /system/lib/libtvserver.so(TvOADControl::unregisterListenerCallback(android::sp)+20)

   接着看剩余的堆栈,其实像libbinder.so,libutils.so, libc.so这样的一些函数,不可能会出错,就要尽可能分析我们自己的code。

01-01 17:05:10.300 2022 3722 E PumpWatchDog: #03 pc0012af33 /system/lib/libtvserver.so (BnTvOADControl::onTransact(unsignedint, android::Parcel const&, android::Parcel*, unsigned int)+682)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #04 pc0001a6cd /system/lib/libbinder.so (android::BBinder::transact(unsignedint, android::Parcel const&, android::Parcel*, unsigned int)+60)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #05 pc0001f7a3 /system/lib/libbinder.so(android::IPCThreadState::executeCommand(int)+582)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #06 pc0001fa95 /system/lib/libbinder.so(android::IPCThreadState::waitForResponse(android::Parcel*, int*)+252)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #07 pc0001fb65 /system/lib/libbinder.so (android::IPCThreadState::transact(int,unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+124)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #08 pc0001ad23 /system/lib/libbinder.so (android::BpBinder::transact(unsignedint, android::Parcel const&, android::Parcel*, unsigned int)+30)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #09 pc0012a4dd /system/lib/libtvserver.so(BpTvOADControl::BponOADControlChangeListener::OnDownloadStateChanged(int)+64)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #10 pc 0012b0ad /system/lib/libtvserver.so(TvOADControl::OnDownloadStateChanged()+38)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #11 pc001efd5b /system/lib/libtvserver.so(CCbmhgOverAirDownload_m_Priv::callistoN_OnDownloadStateChanged()+14)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #12 pc00156a39 /system/lib/libtvserver.so
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #13 pc001ed8db /system/lib/libtvserver.so(CCbmhgOverAirDownload_mcallisto_Priv::evtN_OnEvent(int)+106)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #14 pc001ee1b3 /system/lib/libtvserver.so(CCbmhgOverAirDownload_mcallisto_Priv::mMhegCallistoPumpHandler(int, unsignedint)+622)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #15 pc0013afd9 /system/lib/libtvserver.so
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #16 pc00038303 /system/lib/libTvMiddlewareCore.so
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #17 pc00012623 /system/lib/libutils.so (android::Looper::pollInner(int)+410)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #18 pc00012715 /system/lib/libutils.so (android::Looper::pollOnce(int, int*,int*, void**)+92)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #19 pc00038887 /system/lib/libTvMiddlewareCore.so
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #20 pc00016feb /system/lib/libc.so (__pthread_start(void*)+30)
01-01 17:05:10.300 2022 3722 E PumpWatchDog: #21 pc00014f33 /system/lib/libc.so (__start_thread+6)
结合Crash堆栈和code,一起发现,大致的流程是这样:TvServer向上通知download状态发生改变,TvOADControl::OnDownloadStateChanged(),而函数栈顶又是unregisterListenerCallback,这样可以把问题点缩小。

  综上,结合crash发生的场景,堆栈,来跟code,很快就会发现问题。
  虽然Crash的是在TvServer,但是导致的原因却是在上层。
  这个例子应该是死锁的一个典型应用场景:在一个Callback里,去call另外一个函数,而另外一个函数里又依赖于这把锁。还有就是不要再Callback里,做太多事

这里写图片描述

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章