SWT問題分析總結

1 概述

一般手機異常reboot的原因:
Reboot rootcause

1.JE on the system process
   search keywords “FATAL EXCEPTION IN SYSTEM PROCESS:”in main_log

2.NE on the system process
   search keywords “Fatal signal”in main_log

3.SWT on the system process
   Search keywords “WATCHDOG KILLING SYSTEM PROCESS”in main_log

4.KE occurs
   Search keywords “Kernel panic”in kernel_log

Android SWT 即 Android Software Watchdog Timeout。

System Server進程是Android的一個核心進程,裏面爲APP運行提供了核心的服務。如果System Server的一些核心服務和重要線程卡住,就會導致相應的功能異常。

如手機發生hang機,輸入無響應,無法啓動APP等一些不正常的情況。而且,如果沒有一種機制,讓這些服務復位的話,那麼將嚴重影響客戶體驗。尤其是當前大多數手機把電池封裝在手機裏面的這種,想撥電池重啓都很難。

所以有必要在覈心服務和核心線程卡住的時候,讓系統有自動復位的機會。於是,google引入了Sytem Server watchdog機制。這個機制來監控核心服務和核心線程是否卡住。

Framework層的Watchdog監控系統的重要服務和進程是否死鎖,例如AMS、WMS、PMS、MS、IMS等。在解析init.rc時初始化和啓動,通過addMonitor添加需要監控對象,然後開啓30s循環,如果檢測到死鎖,則dumpTrace並再檢測一輪,如果還是死鎖,則dumpTrace並殺掉systemServer讓zygote重啓。

watchdog is a thread of system_server, it will mornitor some core service of system_server, such as:
AcitivityManagerService,WindowManagerService,PowerManagerService

SWT主要用來監控SystemServer 重要線程/Service 的運行情況。如果判斷阻塞 60s ,就會把系統重啓,來保證系統恢復正常狀態.

2 SWT機制

2.1 原理設計框圖

在這裏插入圖片描述
最開始設計只是在main looper 裏面執行register 的monitor 對象 的monitor 方法. 後續Google 改進,通過HandlerChecker 來達成,在HandlerChecker 中審查注入的montior 對象是否能快速執行。即在foreground thread 來執行register 的monitor 對象,而對於其他的線程,則是審查規定時間內是否可以達到idle,不是一直卡死在某個message執行上。
需要注意的是, SystemServer Watchdog啓動是在SystemServer init 的後期, 如果SystemServer 在init 的過程中卡死了,那麼就意味着watchdog 不會有任何的作用.

2.2 Watchdog的初始化

Android的Watchdog是一個單例線程,在System Server啓動時就會init &start Watchdog:

 private void startOtherServices() {
            ...
            traceBeginAndSlog("InitWatchdog");
            final Watchdog watchdog = Watchdog.getInstance();
            watchdog.init(context, mActivityManagerService);
            traceEnd();
            ...
            traceBeginAndSlog("StartWatchdog");
            Watchdog.getInstance().start();
            traceEnd();
            ...
    }

Watchdog在初始化時,會構建很多HandlerChecker,大致可以分爲兩類:

Monitor Checker,用於檢查是Monitor對象可能發生的死鎖, AMS, PKMS, WMS等核心的系統服務都是Monitor對象。

在代碼裏搜索關鍵字addMonitor,會檢測如下信息:

LINUX/android/frameworks/base/services/core/java/com/android/server/input/
InputManagerService.java	349 Watchdog.getInstance().addMonitor(this); in start()
LINUX/android/frameworks/base/services/core/java/com/android/server/wm/
WindowManagerService.java	1161 Watchdog.getInstance().addMonitor(this); in WindowManagerService()
LINUX/android/frameworks/base/services/core/java/com/android/server/am/
ActivityManagerService.java	2988 Watchdog.getInstance().addMonitor(this); in ActivityManagerService()
LINUX/android/frameworks/base/services/core/java/com/android/server/media/
MediaRouterService.java	95 Watchdog.getInstance().addMonitor(this); in MediaRouterService()
......

各個server的monitor函數也只是檢測是否可以獲得要檢測的鎖對象,這些service通過
Watchdog.getInstance().addMonitor(this)將自己(實現了Watchdog.Monitor)添加到
Watchdog.mMonitorChecker.mMonitors列表中,該列表會不斷調用Monitor.Monitor()函數。

 WindowManagerService.java  
    // Called by the heartbeat to ensure locks are not held indefnitely (for deadlock detection).
    @Override
    public void monitor() {
        synchronized (mWindowMap) { }
    }

具體看各個service中實現的monitor函數,發現這個函數很簡單,就是去獲取對應鎖,如果線程死鎖或其他原因阻塞,那麼必然無法獲取鎖,monitor()函數執行必然會阻塞。Watchdog就是利用這個原理來判斷是否死鎖。

Looper Checker,用於檢查線程的消息隊列是否長時間處於工作狀態。Watchdog自身的消息隊列,Ui, Io, Display這些全局的消息隊列都是被檢查的對象。此外,一些重要的線程的消息隊列,也會加入到Looper Checker中,譬如AMS, PKMS,這些是在對應的對象初始化時加入的。
代碼裏搜索關鍵字addThread,會檢測如下信息:

LINUX/android/frameworks/base/services/core/java/com/android/server/power/
PowerManagerService.java	775 Watchdog.getInstance().addThread(mHandler); in onStart()
LINUX/android/frameworks/base/services/core/java/com/android/server/am/
ActivityManagerService.java	2989 Watchdog.getInstance().addThread(mHandler); in ActivityManagerService()
LINUX/android/frameworks/base/services/core/java/com/android/server/pm/
PackageManagerService.java	2512 Watchdog.getInstance().addThread(mHandler, WATCHDOG_TIMEOUT); in PackageManagerService()

addThread()將PowerManagerService、PackageManagerService、ActivityManagerService
等幾個主線程Handler保存到Watchdog.mHandlerCheckers列表中;同時還會把上面提到的mMonitorChecker也保存到Watchdog.mHandlerCheckers中;另外還會將foreground thread、ui thread、i/o thread 、display thread 、main thread的Handler也保存到Watchdog.mHandlerCheckers中來;

  /frameworks/base/services/core/java/com/android/server/Watchdog.java        
        // The shared foreground thread is the main checker.  It is where we
        // will also dispatch monitor checks and do other work.
        mMonitorChecker = new HandlerChecker(FgThread.getHandler(),
                "foreground thread", DEFAULT_TIMEOUT);
        mHandlerCheckers.add(mMonitorChecker);
        // Add checker for main thread.  We only do a quick check since there
        // can be UI running on the thread.
        mHandlerCheckers.add(new HandlerChecker(new Handler(Looper.getMainLooper()),
                "main thread", DEFAULT_TIMEOUT));
        // Add checker for shared UI thread.
        mHandlerCheckers.add(new HandlerChecker(UiThread.getHandler(),
                "ui thread", DEFAULT_TIMEOUT));
        // And also check IO thread.
        mHandlerCheckers.add(new HandlerChecker(IoThread.getHandler(),
                "i/o thread", DEFAULT_TIMEOUT));
        // And the display thread.
        mHandlerCheckers.add(new HandlerChecker(DisplayThread.getHandler(),
                "display thread", DEFAULT_TIMEOUT));

Watchdog會不斷判斷這些線程的Looper是否空閒,如果一直非空閒,那麼必然被blocked住了。

2.3 Watchdog的運作

通過前面的初始化,已經將watchdog需要監測的對象全部準備就緒。接下來就要看它具體是如何去監測的了。

watchdog本身就是一個線程,我們想知道它是如何去監測各個對象的?那就直接從它的run方法來看就好:

 @Override
    public void run() {
        boolean waitedHalf = false;//標識第一個30s超時
        boolean mSFHang = false;//標識surfaceflinger是否hang?
        mProcessStats.init(); // mtk: get cpu info when SWT occur.
        while (true) {
            final List<HandlerChecker> blockedCheckers;
            final String subject;
            final String name;
            mSFHang = false;
            if (exceptionHWT != null && waitedHalf == false ) {
                exceptionHWT.WDTMatterJava(300);//hang_detect機制相關,用來重置hang_detect的計數,這裏表示正常階段
具體可以查看MOL > Quick Start > Hang Detect 問題快速分析 專題!
            }
            final boolean allowRestart;//發生SWT要不要重啓
            int debuggerWasConnected = 0;

            if (DEBUG)
                Slog.w(TAG, "SWT Watchdog before synchronized:" + SystemClock.uptimeMillis());

            synchronized (this) {

                if (DEBUG)
                    Slog.w(TAG, "SWT Watchdog after synchronized:" + SystemClock.uptimeMillis());

                long timeout = CHECK_INTERVAL;
                long SFHangTime;
                // Make sure we (re)spin the checkers that have become idle within
                // this wait-and-check interval
             //1、調度所有的HandlerChecker
                for (int i=0; i<mHandlerCheckers.size(); i++) {
                    //mHandlerCheckers中包含一個mMonitorChecker和幾個重要線程的HandlerChecker
(System_Server進程是最重要的線程):該for循環主要檢測mMonitorChecker中重要的
 幾把鎖(檢測死鎖)、幾個重要線程的消息隊列是否空閒(檢測是否blocked),當然判斷是
 否死鎖和被blocked的依據便是是否超時。
                    HandlerChecker hc = mHandlerCheckers.get(i);
                    hc.scheduleCheckLocked();//
shceduleCheckLocked()所做的事情可以想象成給所有的目標thread發放任務。
                    //如果一個線程死鎖或被blocked,那麼該HandlerChecker的mCompleted = false、
 mStartTime=阻塞的初始時間點,是判斷的基礎
                }

                if (debuggerWasConnected > 0) {
                    debuggerWasConnected--;
                }

                // NOTE: We use uptimeMillis() here because we do not want to increment the time we
                // wait while asleep. If the device is asleep then the thing that we are waiting
                // to timeout on is asleep as well and won't have a chance to run, causing a false
                // positive on when to kill things.
                //2、開始定期檢查
                long start = SystemClock.uptimeMillis();
                while (timeout > 0) {
                    if (Debug.isDebuggerConnected()) {
                        debuggerWasConnected = 2;
                    }
                    try {
                        wait(timeout);
                    } catch (InterruptedException e) {
                        Log.wtf(TAG, e);
                    }
                    if (Debug.isDebuggerConnected()) {
                        debuggerWasConnected = 2;
                    }
                    timeout = CHECK_INTERVAL - (SystemClock.uptimeMillis() - start);
                }

                //3、檢查HandlerChecker的完成狀態
                //MTK enhance
                SFHangTime = GetSFStatus();
                if (DEBUG) Slog.w(TAG, "**Get SF Time **" + SFHangTime);
                if (SFHangTime > TIME_SF_WAIT * 2) {
                    Slog.v(TAG, "**SF hang Time **" + SFHangTime);
                    mSFHang = true;
                    blockedCheckers = getBlockedCheckersLocked();
                    subject = "";

                } //@@
                else {
                    boolean fdLimitTriggered = false;
                    if (mOpenFdMonitor != null) {
                        fdLimitTriggered = mOpenFdMonitor.monitor();
                    }

                    if (!fdLimitTriggered) {
                        final int waitState = evaluateCheckerCompletionLocked();
//根據mCompleted 、mStartTime值評估等待狀態
                        if (waitState == COMPLETED) {//檢測完成並正常,繼續檢查
                            // The monitors have returned; reset
                            waitedHalf = false;
                            continue;
                        } else if (waitState == WAITING) {//30秒之內,繼續檢查
                            // still waiting but within their configured intervals;
                            // back off and recheck
                            continue;
                        } else if (waitState == WAITED_HALF) {//30~60秒之內,dump一些信息並繼續檢查
                            if (!waitedHalf) {
                                // We've waited half the deadlock-detection interval.  Pull a stack
                                // trace and wait another half.
                                if (exceptionHWT != null) {
                                    exceptionHWT.WDTMatterJava(360);//hang_detect機制相關,表示進入dump階段
                                }
                                ArrayList<Integer> pids = new ArrayList<Integer>();
                                pids.add(Process.myPid());
                                ActivityManagerService.dumpStackTraces(true, pids, null, null,
                                    getInterestingNativePids());
                                mProcessStats.update();
                                waitedHalf = true;
                            }
                            continue;
                        }
                        //4、收集超時的HandlerChecker
                        // something is overdue!
                        blockedCheckers = getBlockedCheckersLocked();
//超了60秒,此時便出問題了,收集超時的HandlerChecker
                        subject = describeCheckersLocked(blockedCheckers);
                    } else {
                        blockedCheckers = Collections.emptyList();
                        subject = "Open FD high water mark reached";
                    }
                }
                allowRestart = mAllowRestart;
            }

            // If we got here, that means that the system is most likely hung.
            // First collect stack traces from all threads of the system process.
            // Then kill this process so that the system will restart.
            //5、保存一些重要日誌,並根據設定,來判斷是否需要重啓系統
            Slog.e(TAG, "**SWT happen **" + subject);
            if (exceptionHWT != null) {
                exceptionHWT.switchFtrace(2);
            }
            name = (mSFHang && subject.isEmpty()) ? "surfaceflinger  hang." : "";
            EventLog.writeEvent(EventLogTags.WATCHDOG, name.isEmpty() ? subject : name);
//將阻塞線程信息打印到Event日誌中

            if (exceptionHWT != null) {
                exceptionHWT.WDTMatterJava(420);//hang_detect機制相關,表示發生了SWT
            }
            mProcessStats.update();
            final String cpuInfo = mProcessStats.printCurrentState(SystemClock.uptimeMillis());
            Slog.d(TAG, mProcessStats.printCurrentLoad());

            ArrayList<Integer> pids = new ArrayList<>();
            pids.add(Process.myPid());
            if (mPhonePid > 0) pids.add(mPhonePid);
            // Pass !waitedHalf so that just in case we somehow wind up here without having
            // dumped the halfway stacks, we properly re-initialize the trace file.
            final File stack = ActivityManagerService.dumpStackTraces(
                    !waitedHalf, pids, null, null, getInterestingNativePids());

            // Give some extra time to make sure the stack traces get written.
            // The system's been hanging for a minute, another second or two won't hurt much.
            SystemClock.sleep(2000);

            // Trigger the kernel to dump all blocked threads, and backtraces on all CPUs to the kernel log
            doSysRq('w');
            doSysRq('l');

            /// M: WDT debug enhancement
            /// need to wait the AEE dumps all info, then kill system server @{
            // Try to add the error to the dropbox, but assuming that the ActivityManager
            // itself may be deadlocked.  (which has happened, causing this statement to
            // deadlock and the watchdog as a whole to be ineffective)
            Slog.v(TAG, "** save all info before killnig system server **");
            Thread dropboxThread = new Thread("watchdogWriteToDropbox") {
                    public void run() {
                        Slog.v(TAG, "** start addErrorToDropBox **");
                        mActivity.addErrorToDropBox(
                                "watchdog", null, "system_server", null, null,
                                name.isEmpty() ? subject : name, cpuInfo, stack, null);
                    }
                };
            dropboxThread.start();
            try {
                dropboxThread.join(2000);  // wait up to 2 seconds for it to return.
            } catch (InterruptedException ignored) {}

            IActivityController controller;
            synchronized (this) {
                controller = mController;
            }
            if ((mSFHang == false) && (controller != null)) {
                Slog.i(TAG, "Reporting stuck state to activity controller");
                try {
                    Binder.setDumpDisabled("Service dumps disabled due to hung system process.");
                    Slog.i(TAG, "Binder.setDumpDisabled");
                    // 1 = keep waiting, -1 = kill system
                    int res = controller.systemNotResponding(subject);
                    if (res >= 0) {
                        Slog.i(TAG, "Activity controller requested to coninue to wait");
                        waitedHalf = false;
                        continue;
                    }
                    Slog.i(TAG, "Activity controller requested to reboot");
                } catch (RemoteException e) {
                }
            }

            // Only kill the process if the debugger is not attached.
            if (Debug.isDebuggerConnected()) {
                debuggerWasConnected = 2;
            }
            if (debuggerWasConnected >= 2) {
                Slog.w(TAG, "Debugger connected: Watchdog is *not* killing the system process");
            } else if (debuggerWasConnected > 0) {
                Slog.w(TAG, "Debugger was connected: Watchdog is *not* killing the system process");
            } else if (!allowRestart) {
                Slog.w(TAG, "Restart not allowed: Watchdog is *not* killing the system process");
            } else {
                Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject);
                WatchdogDiagnostics.diagnoseCheckers(blockedCheckers);
                /// @}

                Slog.w(TAG, "*** GOODBYE!");
                exceptionHWT.WDTMatterJava(330); // 330 means watchdog exit successfully.
                // MTK enhance
                if (mSFHang)
                {
                    Slog.w(TAG, "SF hang!");
                    if (GetSFReboot() > 3)
                    {
                        Slog.w(TAG, "SF hang reboot time larger than 3 time, reboot device!");
                        rebootSystem("Maybe SF driver hang,reboot device.");
                    }
                    else
                    {
                        SetSFReboot();
                    }
                }
                //@
                if (mSFHang) {
                    Slog.v(TAG, "killing surfaceflinger for surfaceflinger hang");
                    String[] sf = new String[] {"/system/bin/surfaceflinger"};
                    int[] pid_sf =  Process.getPidsForCommands(sf);
                    if (pid_sf[0] > 0) {
                        Process.killProcess(pid_sf[0]);
                    }
                    Slog.v(TAG, "killing surfaceflinger end");
                } else {
                    Process.killProcess(Process.myPid());
                }

                System.exit(10);
            }

            waitedHalf = false;
        }
    }

以上代碼片段主要的運行邏輯如下:

1、Watchdog運行後,便開始無限循環,依次調用每一個HandlerChecker的scheduleCheckLocked()方法

2、調度完HandlerChecker之後,便開始定期檢查是否超時,每一次檢查的間隔時間由CHECK_INTERVAL常量設定,爲30秒

3、每一次檢查都會調用evaluateCheckerCompletionLocked()方法來評估一下HandlerChecker的完成狀態:
a、COMPLETED表示已經完成
b、WAITING和WAITED_HALF表示還在等待,但未超時
c、OVERDUE表示已經超時。默認情況下,timeout是1分鐘,但監測對象可以通過傳參自行設定,譬如PKMS的Handler Checker的超時是10分鐘

4、如果超時時間到了,還有HandlerChecker處於未完成的狀態(OVERDUE),則通過getBlockedCheckersLocked()方法,獲取阻塞的HandlerChecker,生成一些描述信息

5、保存日誌,包括一些運行時的堆棧信息,這些日誌是我們解決Watchdog問題的重要依據。如果判斷需要殺掉system_server進程,則給當前進程(system_server)發送signal 9

3 導致 SWT 重啓原因

線程死鎖
Binder的server端卡住
Native方法執行時間過長
SurfaceFlinger卡住
Dump時間過長
Zygote fork進程時卡住
Binder used up

4 log分析

4.1 搜索關鍵 watchdog

搜索關鍵 watchdog初步分析rootcause

03-29 16:59:14.818   748  4784 W Watchdog: *** WATCHDOG KILLING SYSTEM PROCESS: Blocked in monitor com.android.server.input.InputManagerService on foreground thread (android.fg) 
03-29 16:59:14.818   748  4784 W Watchdog: foreground thread stack trace: 
03-29 16:59:14.819   748  4784 W Watchdog:     at com.android.server.input.InputManagerService.nativeMonitor(Native Method) 
03-29 16:59:14.819   748  4784 W Watchdog:     at com.android.server.input.InputManagerService.monitor(InputManagerService.java:1404) 03-29 16:59:14.819   748  4784 W Watchdog:     at com.android.server.Watchdog$HandlerChecker.run(Watchdog.java:179) 
03-29 16:59:14.819   748  4784 W Watchdog:     at android.os.Handler.handleCallback(Handler.java:739) 03-29 16:59:14.819   748  4784 W Watchdog:     at android.os.Handler.dispatchMessage(Handler.java:95) 
03-29 16:59:14.819   748  4784 W Watchdog:     at android.os.Looper.loop(Looper.java:135) 
03-29 16:59:14.819   748  4784 W Watchdog:     at android.os.HandlerThread.run(HandlerThread.java:61) 
03-29 16:59:14.819   748  4784 W Watchdog:     at com.android.server.ServiceThread.run(ServiceThread.java:46) 
03-29 16:59:14.819   748  4784 W Watchdog: *** GOODBYE!

可見WD檢測到InputManagerService發生Block:

03-29 16:59:14.818 748 4784 W Watchdog: *** WATCHDOG KILLING SYSTEM PROCESS: Blocked in monitor com.android.server.input.InputManagerService on foreground thread (android.fg)

在 android.fg線程發生block,查看對應trace:

"android.fg" prio=5 tid=17 Native 
  | group="main" sCount=1 dsCount=0 obj=0x12e7a120 self=0xb7b20160
  | sysTid=771 nice=0 cgrp=default sched=0/0 handle=0xb7b20538
  | state=S schedstat=( 1927397505 30260078920 12131 ) utm=138 stm=54 core=0 HZ=100 
  | stack=0xa51df000-0xa51e1000 stackSize=1036KB
  | held mutexes=
  kernel: (couldn't read /proc/self/task/771/stack)
  native: #00 pc 0000f9b0  /system/lib/libc.so (syscall+28)
  native: #01 pc 0001318d  
/system/lib/libc.so (__pthread_cond_timedwait_relative(pthread_cond_t*, pthread_mutex_t*, timespec const*)+56) 
native: #02 pc 00025315  /system/lib/libinputflinger.so (android::InputReader::monitor()+28) 
native: #03 pc 0000f56d  /system/lib/libandroid_servers.so (???)
native: #04 pc 0011441d  
/system/framework/arm/services.odex (Java_com_android_server_input_InputManagerService_nativeMonitor__J+88) 
  at com.android.server.input.InputManagerService.nativeMonitor(Native method) 
  at com.android.server.input.InputManagerService.monitor(InputManagerService.java:1404) 
  at com.android.server.Watchdog$HandlerChecker.run(Watchdog.java:179)
  at android.os.Handler.handleCallback(Handler.java:739)
  at android.os.Handler.dispatchMessage(Handler.java:95)
  at android.os.Looper.loop(Looper.java:135)
  at android.os.HandlerThread.run(HandlerThread.java:61)
  at com.android.server.ServiceThread.run(ServiceThread.java:46)

Wd監控IMS的InputReader及InputDispatcher線程,從以上trace可知InputReader可能發生死鎖,所以繼續查看InputReader線程:

"InputReader" prio=10 tid=34 Native 
  | group="main" sCount=1 dsCount=0 obj=0x13100220 self=0xb7c0dc80
  | sysTid=4310 nice=-8 cgrp=default sched=0/0 handle=0xb7b098a0
  | state=S schedstat=( 62300529241 31784888301 33265 ) utm=5130 stm=1100 core=3 HZ=100 
  | stack=0xa3b7b000-0xa3b7d000 stackSize=1012KB
  | held mutexes=
   kernel: (couldn't read /proc/self/task/4310/stack)
native: #00 pc 0000f9b0  /system/lib/libc.so (syscall+28)
native: #01 pc 000a8c4b  /system/lib/libart.so (art::ConditionVariable::Wait(art::Thread*)+82) 
native: #02 pc 001abc1b  /system/lib/libart.so (art::JNI::IsInstanceOf(_JNIEnv*, _jobject*, _jclass*)+1022) 
native: #03 pc 0001fe2f  /system/lib/libjavacore.so (???)
native: #04 pc 0001fe71  /system/lib/libjavacore.so (???)
native: #05 pc 00283a93  /system/framework/arm/boot.oat (Java_libcore_io_Posix_writeBytes__Ljava_io_FileDescriptor_2Ljava_lang_Object_2II+142) 
  at libcore.io.Posix.writeBytes(Native method)
  at libcore.io.Posix.write(Posix.java:258)
  at libcore.io.BlockGuardOs.write(BlockGuardOs.java:313)
  at libcore.io.IoBridge.write(IoBridge.java:497)
  at java.io.FileOutputStream.write(FileOutputStream.java:186)
  at java.io.OutputStreamWriter.flushBytes(OutputStreamWriter.java:167)
  - locked <@addr=0x14079080> (a java.io.FileOutputStream)
  at java.io.OutputStreamWriter.close(OutputStreamWriter.java:140)
  - locked <@addr=0x14079080> (a java.io.FileOutputStream)
  at com.android.internal.policy.impl.PhoneWindowManager.saveCoverStatus(PhoneWindowManager.java:6266) 
  at com.android.internal.policy.impl.PhoneWindowManager.interceptKeyBeforeQueueing(PhoneWindowManager.java:5823)
  at com.android.server.wm.InputMonitor.interceptKeyBeforeQueueing(InputMonitor.java:370)
  at com.android.server.input.InputManagerService.interceptKeyBeforeQueueing(InputManagerService.java:1491)

可知在 saveCoverStatus時write發生等待,所以一方面檢測函數調用。

另外從全局來查看下InputReader是否正常:
按power鍵點亮屏幕:

03-29 16:53:36.248   748  4310 D InputReader: Perf-Track KeyboardInputMapper Up keyCode=26

進行觸屏操作:

03-29 16:53:52.894   748  4310 D InputReader: Perf-Track AppLaunch_dispatchPtr:Down:x=396.44937, y=1192.06873
03-29 16:53:53.007   748  4310 D InputReader: Perf-Track AppLaunch_dispatchPtr:Up:x=396.44937, y=1192.06873
03-29 16:53:53.954   748  4310 D InputReader: Perf-Track AppLaunch_dispatchPtr:Down:x=646.10266, y=853.33331
03-29 16:53:54.112   748  4310 D InputReader: Perf-Track AppLaunch_dispatchPtr:Up:x=532.26074, y=902.29510
03-29 16:53:54.457   748  4310 D InputReader: Perf-Track AppLaunch_dispatchPtr:Down:x=370.48544, y=1029.19592
03-29 16:53:54.595   748  4310 D InputReader: Perf-Track AppLaunch_dispatchPtr:Up:x=518.28015, y=920.28101

傳入了鍵值keyCode=0 (KEYCODE_UNKNOWN = 0):

03-29 16:57:35.743   748  4310 D InputReader: Perf-Track KeyboardInputMapper Down keyCode=0, scanCode=251, DownTime=0
03-29 16:57:35.743   748  4310 D InputReader: Perf-Track KeyboardInputMapper Up keyCode=0

所以另一方面檢查keyCode=0 (KEYCODE_UNKNOWN = 0)產生的原因。一方面檢查函數調用是否有異常,另一方面從鍵值映射方面來檢查。
此外 ,我們可以從中吸取一些經驗就是儘量避免在重要服務進行大量耗時操作。

4.2 搜索關鍵字 held by

搜索關鍵字held by判斷線程是否被 Block,是否有死鎖問題
例如:

Cmd line: system_server
"ActivityManager" prio=5 tid=14 Blocked
| group="main" sCount=1 dsCount=0 obj=0x12d46f90 self=0xb732ba18
| sysTid=840 nice=-2 cgrp=apps sched=0/0 handle=0xb732c040
| state=S schedstat=( 3934680041 3104214846 9460 ) utm=166 stm=227 core=2 HZ=100
| stack=0xa6106000-0xa6108000 stackSize=1036KB
| held mutexes=
at com.android.server.net.NetworkStatsService.setUidForeground(NetworkStatsService.java:641)
- waiting to lock <0x2e5f7eff> (a java.lang.Object) held by thread 35
at com.android.server.net.NetworkPolicyManagerService.updateRulesForUidLocked(NetworkPolicyMana
gerService.java:2081)

waiting to lock <0x2e5f7eff> (a java.lang.Object) held by thread 35, 查看thread 35

"NetworkStats" prio=5 tid=35 TimedWaiting
| group="main" sCount=1 dsCount=0 obj=0x1393f9e0 self=0xb758ce48
| sysTid=2021 nice=0 cgrp=apps sched=0/0 handle=0xb76fab58
| state=S schedstat=( 402053186 255954674 601 ) utm=27 stm=13 core=0 HZ=100
| stack=0xa32db000-0xa32dd000 stackSize=1036KB
| held mutexes=
at java.lang.Object.wait!(Native method)
- waiting on <0x16b70c1b> (a java.lang.Object)
at java.lang.Thread.parkFor(Thread.java:1220)
- locked <0x16b70c1b> (a java.lang.Object)
at sun.misc.Unsafe.park(Unsafe.java:299)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:197)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueue
dSynchronizer.java:2055)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:388)
at com.android.server.NativeDaemonConnector$ResponseQueue.remove(NativeDaemonConnector.jav
a:614)
at com.android.server.NativeDaemonConnector.execute(NativeDaemonConnector.java:406)
at com.android.server.NativeDaemonConnector.executeForList(NativeDaemonConnector.java:360)
at com.android.server.NetworkManagementService.getNetworkStatsTethering(NetworkManagementSer
vice.java:1731)

4.3 檢查Binder的Server 端

檢查Binder的Server 端是否卡住

4.3.1 重要信息

如果線程的狀態是Native,並且callstack中含有IPCThreadState::waitForResponse–>IPCThreadState::talkWithDriver的信息就可以初步說明是卡在對端,下一步就是要找到對端

找到事件以及發生的時間點

01-01 03:05:34.411 639 1064 I watchdog: Blocked in handler on main thread (main)
block module: main timeout: 60s

查看時間段內對應線程的 call stack,初步說明是卡在對端

"main" prio=5 tid=1 Native
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x721e0870 self=0xa805d000
  | sysTid=941 nice=0 cgrp=default sched=0/0 handle=0xabfe74a4
  | state=S schedstat=( 6662448086 9845197785 9495 ) utm=548 stm=118 core=1 HZ=100
  | stack=0xbe090000-0xbe092000 stackSize=8MB
  | held mutexes=
  kernel: (couldn't read /proc/self/task/941/stack)
  native: #00 pc 00049304  /system/lib/libc.so (__ioctl+8)
  native: #01 pc 0001deef  /system/lib/libc.so (ioctl+38)
  native: #02 pc 0004242f  /system/lib/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+170)
  native: #03 pc 00042de9  /system/lib/libbinder.so (android::IPCThreadState::waitForResponse(android::Parcel*, int*)+236)
  native: #04 pc 0003d2e5  /system/lib/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+36)
  native: #05 pc 000bcdad  /system/lib/libandroid_runtime.so (???)
  native: #06 pc 0074ec65  /system/framework/arm/boot-framework.oat (Java_android_os_BinderProxy_transactNative__ILandroid_os_Parcel_2Landroid_os_Parcel_2I+132)
  at android.os.BinderProxy.transactNative(Native method)
  at android.os.BinderProxy.transact(Binder.java:764)
  at android.content.om.IOverlayManager$Stub$Proxy.getOverlayInfo(IOverlayManager.java:254)
  at com.android.systemui.statusbar.phone.StatusBar.isUsingDarkTheme(unavailable:-1)
  at com.android.systemui.statusbar.phone.StatusBar.updateTheme(unavailable:-1)
  at com.android.systemui.statusbar.phone.StatusBar.onColorsChanged(unavailable:-1)
  at com.android.internal.colorextraction.ColorExtractor.triggerColorsChanged(ColorExtractor.java:186)
  at com.android.systemui.colorextraction.SysuiColorExtractor.setWallpaperVisible(unavailable:-1)
  at com.android.systemui.colorextraction.SysuiColorExtractor$1.lambda$-com_android_systemui_colorextraction_SysuiColorExtractor$1_3105(unavailable:-1)
  at com.android.systemui.colorextraction.-$Lambda$j2m7lOWVNe22BvvVwNuW1ftTq4c.$m$0(unavailable:-1)
  at com.android.systemui.colorextraction.-$Lambda$j2m7lOWVNe22BvvVwNuW1ftTq4c.run(unavailable:-1)
  at android.os.Handler.handleCallback(Handler.java:790)
  at android.os.Handler.dispatchMessage(Handler.java:99)
  at android.os.Looper.loop(Looper.java:164)
  at android.app.ActivityThread.main(ActivityThread.java:6523)
  at java.lang.reflect.Method.invoke(Native method)
  at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:438)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:857)

systemui調用getOverlayInfo 出問題是因爲Binder對端
查看SYS_BINDER_INFO,確定Binder對端tid以及sysTid,這裏無法確認,但是從getOverlayInfo 來看可能是sysTid=4447

"Binder:639_17" prio=5 tid=115 Blocked
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x13e4a4a0 self=0xa0310e00
  | sysTid=4447 nice=0 cgrp=default sched=0/0 handle=0x85a75970
  | state=S schedstat=( 1407904321 1543930926 3137 ) utm=101 stm=39 core=2 HZ=100
  | stack=0x8597b000-0x8597d000 stackSize=1006KB
  | held mutexes=
  at com.android.server.om.OverlayManagerService$1.getOverlayInfo(OverlayManagerService.java:511)
  - waiting to lock <0x0d53be75> (a java.lang.Object) held by thread 1
  at android.content.om.IOverlayManager$Stub.onTransact(IOverlayManager.java:79)
  at android.os.Binder.execTransact(Binder.java:697)

而0x0d53be75鎖被sysTid=639 持有,持有不釋放是639的Binder調用出了問題

"main" prio=5 tid=1 Native
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x721e0870 self=0xa805d000
  | sysTid=639 nice=-2 cgrp=default sched=0/0 handle=0xabfe74a4
  | state=S schedstat=( 38966905233 5299817531 20464 ) utm=3323 stm=573 core=0 HZ=100
  | stack=0xbe090000-0xbe092000 stackSize=8MB
  | held mutexes=
  kernel: (couldn't read /proc/self/task/639/stack)
  native: #00 pc 00049304  /system/lib/libc.so (__ioctl+8)
  native: #01 pc 0001deef  /system/lib/libc.so (ioctl+38)
  native: #02 pc 0004242f  /system/lib/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+170)
  native: #03 pc 00042de9  /system/lib/libbinder.so (android::IPCThreadState::waitForResponse(android::Parcel*, int*)+236)
  native: #04 pc 0003d2e5  /system/lib/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+36)
  native: #05 pc 000bcdad  /system/lib/libandroid_runtime.so (???)
  native: #06 pc 0074ec65  /system/framework/arm/boot-framework.oat (Java_android_os_BinderProxy_transactNative__ILandroid_os_Parcel_2Landroid_os_Parcel_2I+132)
  at android.os.BinderProxy.transactNative(Native method)
  at android.os.BinderProxy.transact(Binder.java:764)
  at android.os.IInstalld$Stub$Proxy.idmap(IInstalld.java:936)
  at com.android.server.pm.Installer.idmap(Installer.java:327)
  at com.android.server.om.IdmapManager.createIdmap(IdmapManager.java:62)
  at com.android.server.om.OverlayManagerServiceImpl.updateState(OverlayManagerServiceImpl.java:493)
  at com.android.server.om.OverlayManagerServiceImpl.updateAllOverlaysForTarget(OverlayManagerServiceImpl.java:237)
  at com.android.server.om.OverlayManagerServiceImpl.onTargetPackageChanged(OverlayManagerServiceImpl.java:187)
  at com.android.server.om.OverlayManagerService$PackageReceiver.onPackageChanged(OverlayManagerService.java:392)
  - locked <0x0d53be75> (a java.lang.Object)
  at com.android.server.om.OverlayManagerService$PackageReceiver.onReceive(OverlayManagerService.java:350)
  at android.app.LoadedApk$ReceiverDispatcher$Args.lambda$-android_app_LoadedApk$ReceiverDispatcher$Args_53034(LoadedApk.java:1323)

binder調用卡住導致問題發生,這種情況需要檢查binder對端的行爲
若binder對端已無空閒binder線程,則需要從以下兩個方面做進一步分析和解決:
1) 爲何binder線程執行如此長時間?
2) 是否有太多重複binder請求導致binder線程資源被佔用,這種請求是否合理?

4.3.2 如何確定binder的對端

前提:有binder block的callstack的特徵,線程狀態爲Native

IPCThreadState::waitForResponse-->IPCThreadState::talkWithDriver
Cmd line: system_server
"main" prio=5 tid=1 Native
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x721b49b0 self=0xa97b2000
  | sysTid=815 nice=-2 cgrp=default sched=0/0 handle=0xae13a4a4

確定方法:
第一步:獲取時間點 17:54:29.433

06-22 17:54:29.433   815  1035 I watchdog: Blocked in handler on main thread (main)
block module: main                 timeout: 60s

第二步:查看時間段內對應線程的 call stack , 獲取 sysTid 以及卡住的具體接口

----- pid 815 at 2018-06-22 17:53:54 -----
Cmd line: system_server
"main" prio=5 tid=1 Native
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x721b49b0 self=0xa97b2000
  | sysTid=815 nice=-2 cgrp=default sched=0/0 handle=0xae13a4a4
  | state=S schedstat=( 18775190570 13246793364 27928 ) utm=1310 stm=566 core=0 HZ=100
  | stack=0xbe1be000-0xbe1c0000 stackSize=8MB
  | held mutexes=
  kernel: (couldn't read /proc/self/task/815/stack)
  native: #00 pc 00048c8c  /system/lib/libc.so (__ioctl+8)
  native: #01 pc 0001dd65  /system/lib/libc.so (ioctl+32)
  native: #02 pc 00042535  /system/lib/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+168)
  native: #03 pc 00042ee9  /system/lib/libbinder.so (android::IPCThreadState::waitForResponse(android::Parcel*, int*)+236)
  native: #04 pc 0003d309  /system/lib/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+36)
  native: #05 pc 000bcf89  /system/lib/libandroid_runtime.so (???)
  native: #06 pc 002a90a5  /system/framework/arm/boot-framework.oat (Java_android_os_BinderProxy_transactNative__ILandroid_os_Parcel_2Landroid_os_Parcel_2I+132)
  at android.os.BinderProxy.transactNative(Native method)
  at android.os.BinderProxy.transact(Binder.java:764)
  at com.android.internal.telephony.ITelephony$Stub$Proxy.invokeOemRilRequestRaw(ITelephony.java:4634)
  at android.telephony.TelephonyManager.invokeOemRilRequestRaw(TelephonyManager.java:5716)
  at com.android.server.audio.AudioService.sendATCommand(AudioService.java:1068)
  at com.android.server.audio.AudioService.setModemAllPowerdown(AudioService.java:1083)
  at com.android.server.audio.AudioService$AsOnAudioPortUpdateListener.onAudioPatchListUpdate(AudioService.java:1045)
  at android.media.AudioPortEventHandler$1.handleMessage(AudioPortEventHandler.java:108)
  at android.os.Handler.dispatchMessage(Handler.java:106)
  at android.os.Looper.loop(Looper.java:164)
  at com.android.server.SystemServer.run(SystemServer.java:442)
  at com.android.server.SystemServer.main(SystemServer.java:283)
  at java.lang.reflect.Method.invoke(Native method)
  at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:438)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:837)

----- pid 815 at 2018-06-22 17:54:29 -----
Cmd line: system_server
"main" prio=5 tid=1 Native
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x721b49b0 self=0xa97b2000
  | sysTid=815 nice=-2 cgrp=default sched=0/0 handle=0xae13a4a4
  | state=S schedstat=( 18775421647 13246793364 27930 ) utm=1310 stm=566 core=0 HZ=100
  | stack=0xbe1be000-0xbe1c0000 stackSize=8MB
  | held mutexes=
  kernel: (couldn't read /proc/self/task/815/stack)
  native: #00 pc 00048c8c  /system/lib/libc.so (__ioctl+8)
  native: #01 pc 0001dd65  /system/lib/libc.so (ioctl+32)
  native: #02 pc 00042535  /system/lib/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+168)
  native: #03 pc 00042ee9  /system/lib/libbinder.so (android::IPCThreadState::waitForResponse(android::Parcel*, int*)+236)
  native: #04 pc 0003d309  /system/lib/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+36)
  native: #05 pc 000bcf89  /system/lib/libandroid_runtime.so (???)
  native: #06 pc 002a90a5  /system/framework/arm/boot-framework.oat (Java_android_os_BinderProxy_transactNative__ILandroid_os_Parcel_2Landroid_os_Parcel_2I+132)
  at android.os.BinderProxy.transactNative(Native method)
  at android.os.BinderProxy.transact(Binder.java:764)
  at com.android.internal.telephony.ITelephony$Stub$Proxy.invokeOemRilRequestRaw(ITelephony.java:4634)
  at android.telephony.TelephonyManager.invokeOemRilRequestRaw(TelephonyManager.java:5716)
  at com.android.server.audio.AudioService.sendATCommand(AudioService.java:1068)
  at com.android.server.audio.AudioService.setModemAllPowerdown(AudioService.java:1083)
  at com.android.server.audio.AudioService$AsOnAudioPortUpdateListener.onAudioPatchListUpdate(AudioService.java:1045)
  at android.media.AudioPortEventHandler$1.handleMessage(AudioPortEventHandler.java:108)
  at android.os.Handler.dispatchMessage(Handler.java:106)
  at android.os.Looper.loop(Looper.java:164)
  at com.android.server.SystemServer.run(SystemServer.java:442)
  at com.android.server.SystemServer.main(SystemServer.java:283)
  at java.lang.reflect.Method.invoke(Native method)
  at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:438)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:837)

可見是卡在com.android.internal.telephony.ITelephonyStubStubProxy.invokeOemRilRequestRaw(ITelephony.java:4634),卡住的接口是invokeOemRilRequestRaw, sysTid = 815

第三步:Check "SYS_BINDER_INFO"找到815的通信對端
比如
如果 sysTid = 2938,在Check "SYS_BINDER_INFO"找到如下信息:

proc 838
thread 2938:
    outgoing transaction 139654941: ffffffc0877d87ab from 838:2938 to 9111:9123 code 1 flags 10 priv -6...

然後從"SYS_PROCESSES_AND_THREADE" 找到9111是進程monkey

root 9111  9100  1150100  21780  0  20  0  0  0  fg  fffff  2487234d  S  64  com.android.monkey
root 9123  9111  1150100  21780  2  14 -6  0  0  fg  00000  5456456c  R  64  bonder_1

如果不能從SYS_BINDER_INFO找到信息,也可以從kernel log去找,或者在當前卡住的callstack中,僅從名字上結合code,確認卡住的對端和具體的接口

如上例在binder info中找不到815的通信對端,在kernel log中發現binder release的信息:

[11645.672135] (3)[23724:kworker/u8:6]binder: release 815:815 transaction 3729140 out, still active
<6>[11647.223911]  (1)[23672:kworker/u8:1]binder: release 1207:1229 transaction 3729140 in, still active
<6>[11647.223954]  (1)[23672:kworker/u8:1]binder: send failed reply for transaction 3729140, target dead

在binder_info中查找對端,確認對端是1207:1229

incoming transaction 3729140: cc857480 from 0:0 to 1207:1229 code 79 flags 10 pri 0:118 r1 node 23923 size 120:0 data e6280068

查看1207:1229的backtrace:

----- pid 1207 at 2018-06-22 17:54:30 -----
Cmd line: com.android.phone
"Binder:1207_2" prio=5 tid=9 Waiting
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x13c01a90 self=0x9fc53c00
  | sysTid=1229 nice=-2 cgrp=default sched=0/0 handle=0x91ca7970
  | state=S schedstat=( 10524016079 5984437221 36564 ) utm=647 stm=404 core=0 HZ=100
  | stack=0x91bad000-0x91baf000 stackSize=1006KB
  | held mutexes=
  at java.lang.Object.wait(Native method)
  - waiting on <0x0bbbfc71> (a com.android.phone.PhoneInterfaceManager$MainThreadRequest)
  at com.android.phone.PhoneInterfaceManager.PhoneInterfaceManager.java(PhoneInterfaceManager.java:1031)
  - locked <0x0bbbfc71> (a com.android.phone.PhoneInterfaceManager$MainThreadRequest)
  at com.android.phone.PhoneInterfaceManager.sendRequest(PhoneInterfaceManager.java:1010)
  at com.android.phone.PhoneInterfaceManager.invokeOemRilRequestRaw(PhoneInterfaceManager.java:3114)
  at com.android.internal.telephony.ITelephony$Stub.onTransact(ITelephony.java:1448)
  at android.os.Binder.execTransact(Binder.java:697)

看起來應該是卡在phone這邊的invokeOemRilRequestRaw操作

4.4 Native 方法執行時間過長導致重啓

線程狀態Native,查看是否有PowerManagerService.nativeSetAutoSuspend
根據callstack找到是何Native方法時間太久,是否等待硬件返回信息,或者硬件本身有問題

4.5 SurfaceFlinger 卡住導致重啓

搜索關鍵字 I watchdog ,查看是否有 surfaceflinger hang,默認卡住40s,就會重啓。

4.6 Zygote Fork 進程時卡住

線程狀態Native,查看是否有 Process.zygoteSendArgsAndGetResult

4.7 Dump 時間過長

Dump 超過60s 可能會引起手機重啓。
搜索關鍵字dumpStackTraces 或 dumpStackTraces process

一般前面會有ANR發生
前面有fatal JE NE K等Exception
自動化腳本有call “dumpsys” 去dump 系統信息

一般來說上面的情況enduser不會遇到,上面原因一般是系統loading過重或者dump的信息過多

5 案例

現象:
安裝應用時swt,系統重啓
1.小概率登錄賬戶後臺在更新google應用,
2.連接電腦用豌豆莢安裝應用時,
點擊屏幕解鎖然後就重啓卡死了
搜索"watchdog":

07-15 15:13:52.132 984 1174 I watchdog: Blocked in monitor com.android.server.am.ActivityManagerService on foreground thread (android.fg), 
 Blocked in handler on main thread (main), 
 Blocked in handler on i/o thread (android.io),
 Blocked in handler on display thread (android.display), 
 Blocked in handler on ActivityManager (ActivityManager), 
 Blocked in handler on PowerManagerService (PowerManagerService)
block module: android.fg timeout: 60s

查看對應時間線程的call stack

1、----- pid 984 at 2017-07-15 15:13:17 -----
Cmd line: system_server
"android.fg" prio=5 tid=16 Blocked
| group="main" sCount=1 dsCount=0 obj=0x12ca59c0 self=0x9de63800
| sysTid=1004 nice=0 cgrp=default sched=0/0 handle=0x92238920
| state=S schedstat=( 817332233 838592314 1765 ) utm=68 stm=13 core=2 HZ=100
| stack=0x92136000-0x92138000 stackSize=1038KB
| held mutexes=
at com.android.server.am.ActivityManagerService.monitor(ActivityManagerService.java:22667)
- waiting to lock <0x04cc3347> (a com.android.server.am.ActivityManagerService) held by thread 11
at com.android.server.Watchdog$HandlerChecker.run(Watchdog.java:207)
at android.os.Handler.handleCallback(Handler.java:836)
at android.os.Handler.dispatchMessage(Handler.java:103)
at android.os.Looper.loop(Looper.java:203)
at android.os.HandlerThread.run(HandlerThread.java:61)
at com.android.server.ServiceThread.run(ServiceThread.java:46)

//android.fg等待的鎖,被thread 11拿着;

----- pid 984 at 2017-07-15 15:13:17 -----
Cmd line: system_server
"ActivityManager" prio=5 tid=11 Blocked
| group="main" sCount=1 dsCount=0 obj=0x12c532e0 self=0x9de61f00
| sysTid=999 nice=-2 cgrp=default sched=0/0 handle=0x92751920
| state=S schedstat=( 12393939024 9340072944 12993 ) utm=679 stm=560 core=2 HZ=100
| stack=0x9264f000-0x92651000 stackSize=1038KB
| held mutexes=
at com.android.server.pm.PackageManagerService.prepareUserData(PackageManagerService.java:20141)
- waiting to lock <0x0d65c6e0> (a java.lang.Object) held by thread 23
at com.android.server.pm.UserManagerService.onBeforeUnlockUser(UserManagerService.java:2883)
at com.android.server.am.UserController.finishUserUnlocking(UserController.java:292)
- locked <0x04cc3347> (a com.android.server.am.ActivityManagerService)
at com.android.server.am.UserController.unlockUserCleared(UserController.java:973)
- locked <0x04cc3347> (a com.android.server.am.ActivityManagerService)
at com.android.server.am.UserController.maybeUnlockUser(UserController.java:938)
at com.android.server.am.UserController.finishUserBoot(UserController.java:267)
- locked <0x04cc3347> (a com.android.server.am.ActivityManagerService)
at com.android.server.am.UserController.finishUserBoot(UserController.java:222)
at com.android.server.am.UserController.finishUserSwitch(UserController.java:177)
- locked <0x04cc3347> (a com.android.server.am.ActivityManagerService)
at com.android.server.am.ActivityStackSupervisor.activityIdleInternalLocked(ActivityStackSupervisor.java:1739)
at com.android.server.am.ActivityStackSupervisor$ActivityStackSupervisorHandler.activityIdleInternal(ActivityStackSupervisor.java:3902)
- locked <0x04cc3347> (a com.android.server.am.ActivityManagerService)
at com.android.server.am.ActivityStackSupervisor$ActivityStackSupervisorHandler.handleMessage(ActivityStackSupervisor.java:3941)
at android.os.Handler.dispatchMessage(Handler.java:110)
at android.os.Looper.loop(Looper.java:203)
at android.os.HandlerThread.run(HandlerThread.java:61)
at com.android.server.ServiceThread.run(ServiceThread.java:46)

//此處爲thread 11 的call stack ,看此部分也在等待鎖,此鎖被thread 23 拉着,

查看thread 23 的狀態:

---- pid 984 at 2017-07-15 15:13:17 -----
Cmd line: system_server
"PackageManager" prio=5 tid=23 Native
| group="main" sCount=1 dsCount=0 obj=0x12e4b380 self=0x9de65b00
| sysTid=1015 nice=10 cgrp=bg_non_interactive sched=0/0 handle=0x9183b920
| state=S schedstat=( 3538952919 2746466081 4616 ) utm=294 stm=59 core=1 HZ=100
| stack=0x91739000-0x9173b000 stackSize=1038KB
| held mutexes=
at android.net.LocalSocketImpl.readba_native(Native method)
at android.net.LocalSocketImpl.-wrap1(LocalSocketImpl.java:-1)
at android.net.LocalSocketImpl$SocketInputStream.read(LocalSocketImpl.java:110)
- locked <0x05fd685e> (a java.lang.Object)
at libcore.io.Streams.readFully(Streams.java:81)
at com.android.internal.os.InstallerConnection.readFully(InstallerConnection.java:222)
at com.android.internal.os.InstallerConnection.readReply(InstallerConnection.java:237)
at com.android.internal.os.InstallerConnection.transact(InstallerConnection.java:91)
- locked <0x0966443f> (a com.android.internal.os.InstallerConnection)
at com.android.internal.os.InstallerConnection.execute(InstallerConnection.java:124)
at com.android.internal.os.InstallerConnection.dexopt(InstallerConnection.java:147)
at com.android.server.pm.Installer.dexopt(Installer.java:153)
at com.android.server.pm.PackageDexOptimizer.performDexOptLI(PackageDexOptimizer.java:257)
at com.android.server.pm.PackageDexOptimizer.performDexOpt(PackageDexOptimizer.java:101)
- locked <0x0d65c6e0> (a java.lang.Object)
at com.android.server.pm.PackageManagerService.installPackageLI(PackageManagerService.java:15729)
at com.android.server.pm.PackageManagerService.installPackageTracedLI(PackageManagerService.java:15394)
at com.android.server.pm.PackageManagerService.-wrap25(PackageManagerService.java:-1)
at com.android.server.pm.PackageManagerService$9.run(PackageManagerService.java:12798)
- locked <0x0d65c6e0> (a java.lang.Object)
at android.os.Handler.handleCallback(Handler.java:836)
at android.os.Handler.dispatchMessage(Handler.java:103)
at android.os.Looper.loop(Looper.java:203)
at android.os.HandlerThread.run(HandlerThread.java:61)
at com.android.server.ServiceThread.run(ServiceThread.java:46)

從以上的依賴可以看出:

984(system_server):1004(android.fg)—>984(system_server):999(ActivityManager)—>984(system_server):1015(PackageManager)

所以需要查看上面PackageManager 的行爲:看起來在read 信息,如下backtrace ,read 爲何這部分read 操作那麼久??如下,在read 前還有dexoat 優化,安裝,執行的操作,

所以應該是這部分的操作影響到如下這個call stack 執行力較長時間導致SWT

at android.net.LocalSocketImpl.readba_native(Native method)
at android.net.LocalSocketImpl.-wrap1(LocalSocketImpl.java:-1)
at android.net.LocalSocketImpl$SocketInputStream.read(LocalSocketImpl.java:110)
- locked <0x05fd685e> (a java.lang.Object)
at libcore.io.Streams.readFully(Streams.java:81)
at com.android.internal.os.InstallerConnection.readFully(InstallerConnection.java:222)
at com.android.internal.os.InstallerConnection.readReply(InstallerConnection.java:237)
at com.android.internal.os.InstallerConnection.transact(InstallerConnection.java:91)
- locked <0x0966443f> (a com.android.internal.os.InstallerConnection)
at com.android.internal.os.InstallerConnection.execute(InstallerConnection.java:124)
at com.android.internal.os.InstallerConnection.dexopt(InstallerConnection.java:147)
at com.android.server.pm.Installer.dexopt(Installer.java:153)
at com.android.server.pm.PackageDexOptimizer.performDexOptLI(Packag

如以上分析,是在等鎖超時,SWT,導致超時的原因應該是安裝一些應用時,dex2oat 操作等時間較長:如下爲SWT時間時,dex2oat 的時間

07-15 15:12:36.600 12505 12505 I dex2oat : /system/bin/dex2oat --compiler-filter=interpret-only -j1
07-15 15:12:36.600 12505 12505 I dex2oat : /system/bin/dex2oat --compiler-filter=interpret-only -j1
07-15 15:12:46.360 12505 12505 I dex2oat : Large app, accepted running with swap.
07-15 15:12:50.775 12505 12505 W dex2oat : Before Android 4.1, method void com.tencent.biz.pubaccount.AccountDetail.activity.EqqAccountDetailActivity.l() would have incorrectly overridden the package-private method in com.tencent.biz.pubaccount.AccountDetailActivity
07-15 15:12:51.150 12505 12505 W dex2oat : Before Android 4.1, method void com.tencent.mobileqq.activity.fling.ContentWrapView.ensureTransformationInfo() would have incorrectly overridden the package-private method in android.view.View
07-15 15:13:19.736 12505 12505 W dex2oat : Verification of java.util.List com.tencent.mobileqq.activity.contact.addcontact.AddContactsView.b() took 151.200ms
07-15 15:13:54.697 12505 12505 I dex2oat : dex2oat took 78.103s (threads: 1) arena alloc=2MB (2243640B) java alloc=50MB (53066064B) native alloc=45MB (47911536B) free=4MB (4910480B) swap=32MB (33554432B)
07-15 15:13:54.697 12505 12505 I dex2oat : dex2oat took 78.103s (threads: 1) arena alloc=2MB (2243640B) java alloc=50MB (53066064B) native alloc=45MB (47911536B) free=4MB (4910480B) swap=32MB (33554432B)

可以 針對個別apk由於的dex2oat原因安裝失敗/ 安裝慢 / lunch慢進行處理
關於個別apk由於的dex2oat原因安裝失敗/ 安裝慢 / lunch慢進行處理可參考

看上面的應用:是微信,qq,之前我們還有處理過可能會導致這樣的問題的apk:包括qq、微信、GMS

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章