Android ANR之traces日誌線程參數解析大全

        ANR故障是Android開發中的重點難點問題,而traces.txt日誌的分析則是解決ANR問題的關鍵所在。有很多人對於traces.txt日誌中的諸多線程狀態參數一知半解,不知所云。本文章對這些狀態參數結合源碼進行全面的解讀,相信對於解決ANR問題大有裨益(本文並不提供解決ANR問題本身的思路和方法)。

        執行 adb shell kill -3 pid或者當進程發生anr、native crash等故障的時候,系統會生成traces日誌文件,默認輸出到 /data/anr/ 目錄下。traces日誌主要由發生anr時的資源使用情況以及各個線程的狀態組成。

----- pid 8072 at 2020-03-22 10:23:06 -----
Cmd line: com.zte.camera
Build fingerprint: 'ZTE/GEN_CN_P439S01/P439S01:9/PKQ1.180929.001/20181106.134337:userdebug/test-keys'
ABI: 'arm'
...
Heap: 41% free, 11MB/20MB; 186365 objects
...
Total memory 20MB
Max memory 512MB
Zygote space size 1584KB
...
suspend all histogram:	Sum: 191us 99% C.I. 4us-62us Avg: 21.222us Max: 62us
DALVIK THREADS (24):
"Signal Catcher" daemon prio=5 tid=3 Runnable
  | group="system" sCount=0 dsCount=0 flags=0 obj=0x14540088 self=0xeb6cb200
  | sysTid=8078 nice=0 cgrp=default sched=0/0 handle=0xe5dcb970
  | state=R schedstat=( 59094373 1072863 12 ) utm=4 stm=1 core=4 HZ=100
  | stack=0xe5cd0000-0xe5cd2000 stackSize=1010KB
  | held mutexes= "mutator lock"(shared held)
  native: #00 pc 002d975f  /system/lib/libart.so (art::DumpNativeStack(std::__1::basic_ostream<char, std::__1::char_traits<char>>&, int, BacktraceMap*, char const*, art::ArtMethod*, void*, bool)+134)
  native: #01 pc 0036e91b  /system/lib/libart.so (art::Thread::DumpStack(std::__1::basic_ostream<char, std::__1::char_traits<char>>&, bool, BacktraceMap*, bool) const+210)
  native: #02 pc 0036b0d3  /system/lib/libart.so (art::Thread::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char>>&, bool, BacktraceMap*, bool) const+34)
  native: #03 pc 00383d89  /system/lib/libart.so (art::DumpCheckpoint::Run(art::Thread*)+624)
  native: #04 pc 0037e06f  /system/lib/libart.so (art::ThreadList::RunCheckpoint(art::Closure*, art::Closure*)+314)
  native: #05 pc 0037d767  /system/lib/libart.so (art::ThreadList::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char>>&, bool)+758)
  native: #06 pc 0037d39f  /system/lib/libart.so (art::ThreadList::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits<char>>&)+614)
  native: #07 pc 00357669  /system/lib/libart.so (art::Runtime::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits<char>>&)+120)
  native: #08 pc 00360689  /system/lib/libart.so (art::SignalCatcher::HandleSigQuit()+1040)
  native: #09 pc 0035f81b  /system/lib/libart.so (art::SignalCatcher::Run(void*)+246)
  native: #10 pc 00071db1  /system/lib/libc.so (__pthread_start(void*)+22)
  native: #11 pc 0001de65  /system/lib/libc.so (__start_thread+24)
  (no managed stack frames)

"main" prio=5 tid=1 Native
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x74cb56d8 self=0xeb6ca000
  | sysTid=8072 nice=-10 cgrp=default sched=0/0 handle=0xeffe9494
  | state=S schedstat=( 1457862132 142332556 1571 ) utm=114 stm=31 core=4 HZ=100
  | stack=0xff195000-0xff197000 stackSize=8MB
  | held mutexes=
  kernel: __switch_to+0x90/0xe8
  kernel: futex_wait_queue_me+0xc4/0x13c
  kernel: futex_wait+0xe4/0x204
  kernel: do_futex+0x168/0x9a0
  kernel: compat_SyS_futex+0xf0/0x174
  kernel: __sys_trace+0x4c/0x4c
  native: #00 pc 00019e8c  /system/lib/libc.so (syscall+28)

其中以下片段爲線程的各項狀態參數,是本文將要解析的部分。

"main" prio=5 tid=1 Native
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x74cb56d8 self=0xeb6ca000
  | sysTid=8072 nice=-10 cgrp=default sched=0/0 handle=0xeffe9494
  | state=S schedstat=( 1457862132 142332556 1571 ) utm=114 stm=31 core=4 HZ=100
  | stack=0xff195000-0xff197000 stackSize=8MB
  | held mutexes=

        輸出線程的狀態參數所涉及的核心代碼爲Thread::DumpState函數(代碼位於/art/runtime/thread.cc文件),以下深入全面解析各行參數的涵義:

(1)"main" prio=5 tid=1 Native

  • "main" 

       線程名。"main"表示該線程爲主線程;"Binder:8072_2"表示爲線程與進程8072關聯的Binder線程;"Signal Catcher" daemon表示爲守護線程Signal Catcher。

os << '"' << *thread->tlsPtr_.name << '"';
  • prio=5

       線程優先級。通過java.lang.Thread.setPriority設置,取值範圍 [1, 10],低優先級 -> 高優先級。

os << " prio=" << priority
  • tid=1

       進程內部線程id。取值一般比較小。

os << " tid=" << thread->GetThreadId();
  • Native

       線程狀態。具體取值及意義如下圖。

os  << " " << thread->GetState();

(2)group="main" sCount=1 dsCount=0 flags=1 obj=0x74cb56d8 self=0xeb6ca000

  • group="main"

       線程所屬的線程組。Java中使用ThreadGroup來表示線程組,它可以對一批線程進行分類管理,Java允許程序直接對線程組進行控制。默認情況下,所有的線程都屬於主線程組。我們也可以給線程設置分組。

os << "  | group=\"" << group_name << "\"";
  • sCount=1

       線程掛起次數。

  os << " sCount=" << thread->tls32_.suspend_count;
  • dsCount=0 

       用於調試的線程掛起次數。

os << " dsCount=" << thread->tls32_.debug_suspend_count;
  • flags=1
os << " flags=" << thread->tls32_.state_and_flags.as_struct.flags;
  • obj=0x74cb56d8

       當前線程關聯的java線程對象。

os << " obj=" << reinterpret_cast<void*>(thread->tlsPtr_.opeer);
  • self=0xeb6ca000

       當前線程地址。

os << " self=" << reinterpret_cast<const void*>(thread) << "\n";

(3)sysTid=8072 nice=-10 cgrp=default sched=0/0 handle=0xeffe9494

  • sysTid=8072

      線程號,Linux內核分配的id。在系統中是唯一編號的。

  os << "  | sysTid=" << tid;
  • nice=-10

       線程的調度優先級。通過android.os.Process.setThreadPriority設置,取值範圍 [-20, 19],高優先級 -> 低優先級。

    os << " nice=" << getpriority(PRIO_PROCESS, tid);
  • cgrp=default

       程所屬的進程調度組。Control groups,是Linux內核的一個功能,用來限制,控制與分離一個進程組羣的資源(如CPU、內存、磁盤輸入輸出等)。

os << " cgrp=" << scheduler_group_name;
  • sched=0/0

      分別標誌了線程的調度策略和優先級

os << " sched=" << policy << "/" << sp.sched_priority;
  • handle=0xeffe9494

       線程的處理函數地址

  os << " handle=" << reinterpret_cast<void*>(thread->tlsPtr_.pthread_self);

(4)state=S schedstat=( 1457862132 142332556 1571 ) utm=114 stm=31 core=4 HZ=100

  • state=S

       線程的調度狀態。  

  os << "  | state=" << native_thread_state;
  • schedstat=( 1457862132 142332556 1571 )

       CPU調度時間統計,分別表示線程在cpu上執行的時間、線程的等待時間和線程執行的時間片長度。可通過adb shell cat /proc/[pid]/task/[tid]/schedstat查看。

  os << " schedstat=( " << scheduler_stats << " )";
  • utm=114 stm=31

       用戶態/內核態下使用CPU時間(單位是jiffies)。adb shell cat /proc/[pid]/task/[tid]/stat查看。

  os << " utm=" << utime
     << " stm=" << stime;
  • core=4

      最後執行該線程的cpu核的序號 。

  os << " core=" << task_cpu;
  • HZ=100

      全局變量jiffies用來記錄自系統啓動以來產生的中斷總數。系統啓動時,內核將該變量初始化爲0,之後每次時鐘中斷處理程序都會增加該變量的值。一秒內時鐘中斷的次數等於Hz。Tick是HZ的倒數,表示每發生一次時鐘中斷所需的時間。HZ=100,那麼Tick=10ms。每秒增加的jiffies數等於Hz。系統運行時間以秒爲單位,等於jiffies/Hz。

  os << " HZ=" << sysconf(_SC_CLK_TCK) << "\n";

(5)stack=0xff195000-0xff197000 stackSize=8MB

  • stack=0xff195000-0xff197000

        線程棧的地址區間

    os << "  | stack=" << reinterpret_cast<void*>(thread->tlsPtr_.stack_begin) << "-"
        << reinterpret_cast<void*>(thread->tlsPtr_.stack_end);
  • stackSize=8MB

       線程棧的大小。stack地址區間的距離(0xff197000-0xff195000 = 8192KB)。Android中主線程棧大小默認爲8M,子線程棧大小稍微小於1M。

    os << PrettySize(thread->tlsPtr_.stack_size) << "\n";

(6)held mutexes=

 線程所持有mutex類型。分爲獨佔鎖exclusive和共享鎖shared兩類。

  os << "  | held mutexes=";
    for (size_t i = 0; i < kLockLevelCount; ++i) {
      if (i != kMonitorLock) {
        BaseMutex* mutex = thread->GetHeldMutex(static_cast<LockLevel>(i));
        if (mutex != nullptr) {
          os << " \"" << mutex->GetName() << "\"";
          if (mutex->IsReaderWriterMutex()) {
            ReaderWriterMutex* rw_mutex = down_cast<ReaderWriterMutex*>(mutex);
            if (rw_mutex->GetExclusiveOwnerTid() == tid) {
              os << "(exclusive held)";
            } else {
              os << "(shared held)";
            }
          }
        }
      }
    }

以下爲/art/runtime/thread.cc->Thread::DumpState函數的主要代碼。如需查看完整源碼,請移步http://androidxref.com/9.0.0_r3/xref/art/runtime/thread.cc

void Thread::DumpState(std::ostream& os, const Thread* thread, pid_t tid) {
  std::string group_name;
  int priority;
  bool is_daemon = false;
  Thread* self = Thread::Current();
  if (thread != nullptr) {
    os << '"' << *thread->tlsPtr_.name << '"';
    if (is_daemon) {
      os << " daemon";
    }
    os << " prio=" << priority
       << " tid=" << thread->GetThreadId()
       << " " << thread->GetState();
    if (thread->IsStillStarting()) {
      os << " (still starting up)";
    }
    os << "\n";
  } else {
    os << '"' << ::art::GetThreadName(tid) << '"'
       << " prio=" << priority
       << " (not attached)\n";
  }

  if (thread != nullptr) {
    MutexLock mu(self, *Locks::thread_suspend_count_lock_);
    os << "  | group=\"" << group_name << "\""
       << " sCount=" << thread->tls32_.suspend_count
       << " dsCount=" << thread->tls32_.debug_suspend_count
       << " flags=" << thread->tls32_.state_and_flags.as_struct.flags
       << " obj=" << reinterpret_cast<void*>(thread->tlsPtr_.opeer)
       << " self=" << reinterpret_cast<const void*>(thread) << "\n";
  }

  os << "  | sysTid=" << tid
     << " nice=" << getpriority(PRIO_PROCESS, tid)
     << " cgrp=" << scheduler_group_name;
  if (thread != nullptr) {
    int policy;
    sched_param sp;
#if !defined(__APPLE__)
    policy = sched_getscheduler(tid);
#else
    CHECK_PTHREAD_CALL(pthread_getschedparam, (thread->tlsPtr_.pthread_self, &policy, &sp), __FUNCTION__);
#endif
    os << " sched=" << policy << "/" << sp.sched_priority
       << " handle=" << reinterpret_cast<void*>(thread->tlsPtr_.pthread_self);
  }
  os << "\n";
  // Grab the scheduler stats for this thread.
  std::string scheduler_stats;
  if (ReadFileToString(StringPrintf("/proc/self/task/%d/schedstat", tid), &scheduler_stats)) {
    scheduler_stats.resize(scheduler_stats.size() - 1);  // Lose the trailing '\n'.
  } else {
    scheduler_stats = "0 0 0";
  }

  char native_thread_state = '?';
  int utime = 0;
  int stime = 0;
  int task_cpu = 0;
  GetTaskStats(tid, &native_thread_state, &utime, &stime, &task_cpu);

  os << "  | state=" << native_thread_state
     << " schedstat=( " << scheduler_stats << " )"
     << " utm=" << utime
     << " stm=" << stime
     << " core=" << task_cpu
     << " HZ=" << sysconf(_SC_CLK_TCK) << "\n";
  if (thread != nullptr) {
    os << "  | stack=" << reinterpret_cast<void*>(thread->tlsPtr_.stack_begin) << "-"
        << reinterpret_cast<void*>(thread->tlsPtr_.stack_end) << " stackSize="
        << PrettySize(thread->tlsPtr_.stack_size) << "\n";
    // Dump the held mutexes.
    os << "  | held mutexes=";
    for (size_t i = 0; i < kLockLevelCount; ++i) {
      if (i != kMonitorLock) {
        BaseMutex* mutex = thread->GetHeldMutex(static_cast<LockLevel>(i));
        if (mutex != nullptr) {
          os << " \"" << mutex->GetName() << "\"";
          if (mutex->IsReaderWriterMutex()) {
            ReaderWriterMutex* rw_mutex = down_cast<ReaderWriterMutex*>(mutex);
            if (rw_mutex->GetExclusiveOwnerTid() == tid) {
              os << "(exclusive held)";
            } else {
              os << "(shared held)";
            }
          }
        }
      }
    }
    os << "\n";
  }
}

希望能給大家帶來幫助。

如有錯誤,歡迎指正。


參考資料:

http://gityuan.com/2016/11/26/art-trace/

http://androidxref.com/9.0.0_r3/xref/art/runtime/thread.cc

https://www.jianshu.com/p/cf504cf98750

https://blog.csdn.net/a740169405/article/details/98178287

https://www.cnblogs.com/hebao0514/p/4510498.html

https://blog.csdn.net/l460133921/article/details/51134213

https://blog.csdn.net/u010154760/article/details/45312471

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章