Impala查詢卡頓分析案例

Impala查詢卡頓分析案例

最近在開發時遇到查詢卡頓(stuck)的情況,感覺比較有代表性,因此記錄一下排查過程。在生產環境中也可以用類似的方法找到卡頓的源頭。

注:本文所用的Impala版本爲3.4.0-SNAPSHOT。Impala由C++和Java寫成,分別對應BE (Backend) 和 FE (Frontend) 兩部分。

案例描述

使用Alter Table語句新建一個partition時,查詢一直不返回。在Coordinator網頁上可以看到查詢一直處於CREATED狀態:
Query Stuck Screenshot
這個查詢設置了 sync_ddl=true,在忙碌的集羣裏是有可能十幾分鍾未完成的,但在我的測試環境裏就不應該這樣。

AlterTable語句屬於DDL,設置sync_ddl=true時,其執行流程如下:
AlterTable with Sync_DDL
圖中各個步驟如下:
(1) 客戶端發送查詢請求
(2) Coordinator向Catalogd發送ExecDdl請求
(3) Catalogd向HMS發送AddPartition請求
(4) HMS創建Partition,並在HDFS中創建partition目錄
(5) Catalogd向NameNode加載新Partition的文件信息
(6) Catalogd在執行完DDL後,將元數據的更新發送給Statestore。Statestore會將更新廣播給所有Coordinator。
(7) Catalogd返回包含更新的topic version給Coordinator
(8) Coordinator等到所有Coordinator都接收到了該version的更新時再返回給客戶端

卡頓可能發生在上面任何一步中,需要從頭開始排查。

查找卡頓源頭

I. 找到Coordinator卡住的線程

查詢是發給Coordinator的,因此從它開始。先查日誌,找到這個查詢相關的log。每個查詢在被解析的時候都會打一行 “Analyzing …” 這樣的日誌,藉此找到查詢:

$ grep -i analyzing /tmp/impalad_node2.INFO | grep 'ALTER TABLE insert_inherit_permission'
I1101 09:52:17.002415 20747 Frontend.java:1256] a14a8bbbaf75bf19:63c6f76f00000000] Analyzing query: ALTER TABLE insert_inherit_permission ADD PARTITION(p1=1, p2=1, p3=1) db: default

我的查詢是 “ALTER TABLE insert_inherit_permission ADD PARTITION(p1=1, p2=1, p3=1)”,在grep的時候可以加上些關鍵字快速定位。在我的測試環境中,日誌是在 /tmp/impalad_node2.INFO,一般在CDH集羣中,日誌是在 /var/log/impalad/impalad.INFO。

日誌中第三列就是線程id,因此FE處理這個查詢的線程id是20747。可以看下這個線程所有的log,看到最後是停在 “Analysis and authorization finished”:
Coordinator Logs
既然已經做完語法分析和權限鑑定了,查詢是卡在哪裏呢?

先通過 jstack 看是否有相關信息,jstack輸出中的nid就是十六進制的線程id,在我的例子中並沒有發現線程20747的堆棧,因此要查看所有BE的線程了。BE是C++寫的,理論上我們可以通過gdb attach上impalad,然後打出線程堆棧。但這樣會停住整個impalad進程,可能會導致心跳超時而查詢被取消的情況。爲了不污染現場環境,可以使用gcore促發一次core dump。

$ gcore 
usage:  gcore [-o filename] pid
$ sudo gcore -o impalad_core 19731
......
[New LWP 20697]
[New LWP 20747]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Installing openjdk unwinder
Traceback (most recent call last):
  File "/usr/share/gdb/auto-load/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so-gdb.py", line 52, in <module>
    class Types(object):
  File "/usr/share/gdb/auto-load/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so-gdb.py", line 66, in Types
    nmethodp_t = gdb.lookup_type('nmethod').pointer()
gdb.error: No type named nmethod.
pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
185    ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S: No such file or directory.
warning: target file /proc/19731/cmdline contained unexpected null characters
Saved corefile impalad_core.19731

Dump的時候會被形如 “[New LWP xxxxx]” 的輸出刷屏,還可能會有warning,但不用管,只要最後是成功的就行了。生成的core文件一般會比較大,因爲把進程的內存也dump下來了。比如在我的開發環境中,core文件就有12GB。

$ ll -h impalad_core.19731
-rw-r--r-- 1 root root 12G 11月  1 10:08 impalad_core.19731

生產環境中的內存可能會更大,仍用core dump就不太合適了,後面我們再介紹其它的辦法。有了core文件後,就可以用gdb來分析了。可能會佔用大量內存,因此在生產環境中最好換臺機器做分析。

$ gdb -c impalad_core.19731 /home/quanlong/workspace/Impala/be/build/latest/service/impalad
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/quanlong/workspace/Impala/be/build/latest/service/impalad...done.
[New LWP 19731]
[New LWP 19762]
...
[New LWP 20697]
[New LWP 20747]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/home/quanlong/workspace/Impala/be/build/latest/service/impalad'.
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
185    ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S: No such file or directory.
[Current thread is 1 (Thread 0x7f2f3fde48c0 (LWP 19731))]
Installing openjdk unwinder
Traceback (most recent call last):
  File "/usr/share/gdb/auto-load/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so-gdb.py", line 52, in <module>
    class Types(object):
  File "/usr/share/gdb/auto-load/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so-gdb.py", line 66, in Types
    nmethodp_t = gdb.lookup_type('nmethod').pointer()
gdb.error: No type named nmethod.
(gdb) 

找到線程20747並查看堆棧,gdb裏的線程編號並不是線程id,但thread指令只能跳到指定編號的線程,因此我們要先用 “thread find 20747” 找到線程編號,然後再切到那個線程用bt指令打印堆棧:

(gdb) thread find 20747
Thread 285 has target id 'Thread 0x7f2e8f3f2700 (LWP 20747)'
(gdb) thread 285
[Switching to thread 285 (Thread 0x7f2e8f3f2700 (LWP 20747))]
#0  0x00007f2f3f7da87f in __libc_recv (fd=403, buf=0x10e9e000, n=131072, flags=0) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:28
28    ../sysdeps/unix/sysv/linux/x86_64/recv.c: No such file or directory.
(gdb) bt
#0  0x00007f2f3f7da87f in __libc_recv (fd=403, buf=0x10e9e000, n=131072, flags=0) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:28
#1  0x0000000004ccd3af in apache::thrift::transport::TSocket::read(unsigned char*, unsigned int) ()
......

輸出的結果會比較長,如圖所示:
GDB Thread Backtrace
最重要的是劃紅線的部分,也就是這塊:

#15 0x0000000002513982 in impala::ClientConnection<impala::CatalogServiceClientWrapper>::DoRpc
#16 0x000000000250fb8c in impala::ClientConnection<impala::CatalogServiceClientWrapper>::DoRpcWithRetry
#17 0x000000000250cced in impala::CatalogOpExecutor::Exec
#18 0x00000000022b9012 in impala::ClientRequestState::ExecDdlRequest
#19 0x00000000022b4253 in impala::ClientRequestState::Exec
#20 0x0000000002241cea in impala::ImpalaServer::ExecuteInternal
#21 0x000000000224131d in impala::ImpalaServer::Execute

所以Coordinator是向Catalogd發了RPC請求,正在等待結果返回。堆棧裏有代碼行號,也可查看相關代碼。
現在可以確認Coordinator是卡在RPC上了,接下來需要確認Catalogd在做什麼。

II. 找到Catalogd卡住的線程

Catalogd的日誌裏並沒有發現明顯的問題,還是得看線程堆棧。我這裏catalogd比較閒,jstack結果中只有一個是跟ExecDdl相關的:

"Thread-10" #43 prio=5 os_prio=0 tid=0x0000000012324800 nid=0x510c in Object.wait() [0x00007f6808952000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at org.apache.impala.catalog.CatalogServiceCatalog.waitForSyncDdlVersion(CatalogServiceCatalog.java:2635)
        - locked <0x0000000722aff010> (a org.apache.impala.catalog.TopicUpdateLog)
        at org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:414)
        at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:167)

因此源頭就是卡在FE的 CatalogServiceCatalog.waitForSyncDdlVersion() 裏,線程id是0x510c,也就是20748. 回看日誌,最後一行輸出是 “Loaded file and block metadata for …”
Catalog Logs
因此該表的metadata加載已經完成了,進入waitForSyncDdlVersion是因爲該查詢使用了sync_ddl=true,catalogd需要等到更新被髮送出去才能知道對應的statestore topic version,才能返回給coordinator。關於metadata的廣播及sync_ddl的實現篇幅較長,我們在後續的文章中再介紹。

到此我的診斷就結束了,可以結合代碼去debug。然而在生產環境中catalogd一般很忙碌,不一定能直接鎖定jstack中的線程。有一個想法是通過coordinator線程確定跟catalogd連接的端口,再以此找到catalogd對應的線程,但fd是所有線程共享的,並沒法通過端口來找到線程。因此目前也沒有比較好的辦法,好在catalogd的大部分邏輯都是java實現的,可以多打幾次jstack,對比看哪些線程一直存在來排查。

使用Minidump替代Core Dump

生產環境中的impalad內存一般會比較大,如果使用core dump,生成的core文件也會很大,不利於分析。實際上我們感興趣的是各個線程的堆棧,並不需要所有的內存數據。Google Breakpad 提供的 Minidumps 提供了新的解決方案,它只會導出崩潰原因、各個線程不超過8KB的棧內存、運行環境(OS、CPU等)和進程信息(加載的模塊等)。這樣導出的 Minidump 文件就非常小,在本例中,core dump文件的大小是 12GB,minidump 文件的大小卻只有 3MB!

使用要求

  • Impala版本不低於2.7:Impala 2.7(對應的CDH版本是cdh5.9)開始支持外界觸發的Minidump,只需要給進程發送SIGUSR1信號。
  • 編譯對應版本的Impala源碼:需要使用 Debug 模式編譯才能保留所有符號信息(symbol),還需要使用toolchain中的breakpad工具來處理minidump。另外編譯環境要跟線上環境一致,比如線上環境爲ubuntu16.04,編譯時也要找ubuntu16.04的機器來做。

使用方法

1. 生成Minidump

找到impalad進程的PID,我這裏是19731,然後用 “kill -s SIGUSR1 19731” 觸發Minidump,成功後在impalad的日誌中能找到生成的路徑:

Caught signal: SIGUSR1
Wrote minidump to /tmp/minidumps/impalad/2d32f167-9189-4fdc-9fc4ef9d-624b3a1d.dmp

把這個minidump文件傳送到編譯Impala的機器上。

2. 生成符號信息

Minidump文件並不包含任何符號信息,因此我們要自己生成Breakpad形式的符號信息來匹配上,這樣我們才能看到函數名(而不只是函數地址)等等。

在編譯Impala的目錄(下稱IMPALA_HOME)裏,可以找到 bin/dump_breakpad_symbols.py,用它來生成breakpad形式的符號信息,假設說我們希望把符號信息存到 /tmp/syms,指令如下:

$ cd $IMPALA_HOME
$ source bin/impala-config.sh
$ bin/dump_breakpad_symbols.py -f be/build/latest/service/libfesupport.so be/build/latest/service/impalad be/build/latest/util/libloggingsupport.so -d /tmp/syms
INFO:root:Processing binary file: be/build/latest/service/libfesupport.so
INFO:root:Processing binary file: be/build/latest/service/impalad
INFO:root:Processing binary file: be/build/latest/util/libloggingsupport.so

生成的文件大小是421MB,比Minidump文件還大一些,因爲裏面有各種文件名、函數名等:

$ du -sh /tmp/syms
421M    /tmp/syms
$ head /tmp/syms/impalad/0894679AAB4BBE9079BA0C593F2114890/impalad.sym
MODULE Linux x86_64 0894679AAB4BBE9079BA0C593F2114890 impalad
INFO CODE_ID 9A6794084BAB90BE79BA0C593F211489
FILE 0 /container.redhat6/build/cdh/hadoop/3.0.0-cdh6.x-SNAPSHOT/rpm/BUILD/hadoop-3.0.0-cdh6.x-SNAPSHOT/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/exception.c
FILE 1 /container.redhat6/build/cdh/hadoop/3.0.0-cdh6.x-SNAPSHOT/rpm/BUILD/hadoop-3.0.0-cdh6.x-SNAPSHOT/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs.c
FILE 2 /container.redhat6/build/cdh/hadoop/3.0.0-cdh6.x-SNAPSHOT/rpm/BUILD/hadoop-3.0.0-cdh6.x-SNAPSHOT/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c
FILE 3 /container.redhat6/build/cdh/hadoop/3.0.0-cdh6.x-SNAPSHOT/rpm/BUILD/hadoop-3.0.0-cdh6.x-SNAPSHOT/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jni_helper.c
FILE 4 /container.redhat6/build/cdh/hadoop/3.0.0-cdh6.x-SNAPSHOT/rpm/BUILD/hadoop-3.0.0-cdh6.x-SNAPSHOT/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/os/posix/mutexes.c
FILE 5 /container.redhat6/build/cdh/hadoop/3.0.0-cdh6.x-SNAPSHOT/rpm/BUILD/hadoop-3.0.0-cdh6.x-SNAPSHOT/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/os/posix/thread_local_storage.c
FILE 6 /home/quanlong/workspace/Impala/be/generated-sources/gen-cpp/BackendGflags_types.cpp
FILE 7 /home/quanlong/workspace/Impala/be/generated-sources/gen-cpp/BackendGflags_types.h

3. 解析Minidump

使用Impala toolchain裏的breakpad工具,假設對應的版本是 IMPALA_BREAKPAD_VERSION, 指令如下:

$IMPALA_TOOLCHAIN/breakpad-$IMPALA_BREAKPAD_VERSION/bin/minidump_stackwalk $MINIDUMP_FILE $BREAKPAD_SYMBOLS > $RESULT_FILE

在本例中就是:

$ $IMPALA_HOME/toolchain/breakpad-97a98836768f8f0154f8f86e5e14c2bb7e74132e-p2/bin/minidump_stackwalk /tmp/minidumps/impalad/2d32f167-9189-4fdc-9fc4ef9d-624b3a1d.dmp /tmp/syms > /tmp/resolved.txt
......
2019-11-03 08:33:22: basic_code_modules.cc:110: INFO: No module at 0x0
2019-11-03 08:33:22: basic_code_modules.cc:110: INFO: No module at 0x0
2019-11-03 08:33:22: minidump_processor.cc:326: INFO: Processed /tmp/minidumps/impalad/2d32f167-9189-4fdc-9fc4ef9d-624b3a1d.dmp
2019-11-03 08:33:22: minidump.cc:4980: INFO: Minidump closing minidump

中間會有刷屏的輸出,只要最終解析成功就行了。生成的文件內容如下:

Operating system: Linux
                  0.0.0 Linux 4.15.0-66-generic #75~16.04.1-Ubuntu SMP Tue Oct 1 14:01:08 UTC 2019 x86_64
CPU: amd64
     family 6 model 158 stepping 10
     1 CPU

GPU: UNKNOWN

Crash reason:  DUMP_REQUESTED
Crash address: 0x4d78597
Process uptime: not available

Thread 0 (crashed)
 0  impalad!google_breakpad::ExceptionHandler::WriteMinidump() + 0x57
    rax = 0x00007ffeed518510   rdx = 0x0000000000000000
    rcx = 0x0000000004d7857f   rbx = 0x0000000000000000
    rsi = 0x0000000000000001   rdi = 0x00007ffeed5183a8
    rbp = 0x00007ffeed518ab0   rsp = 0x00007ffeed518310
     r8 = 0x0000000000000000    r9 = 0x0000000000000024
    r10 = 0x00007f2f3f7db340   r11 = 0x00007ffeed518980
    r12 = 0x00007ffeed518970   r13 = 0x0000000002428f99
    r14 = 0x0000000000000000   r15 = 0x00007ffeed5189d0
    rip = 0x0000000004d78597
    Found by: given as instruction pointer in context
 1  impalad!google_breakpad::ExceptionHandler::WriteMinidump(std::string const&, bool (*)(google_breakpad::MinidumpDescriptor const&, void*, bool), void*) + 0xf0
    rbx = 0x00007f2f3caf0140   rbp = 0x00007ffeed518ab0
    rsp = 0x00007ffeed518960   r12 = 0x00007ffeed518970
    r13 = 0x0000000002428f99   r14 = 0x0000000000000000
    r15 = 0x00007ffeed5189d0   rip = 0x0000000004d78e60
    Found by: call frame info
 2  impalad!impala::HandleSignal [minidump.cc : 103 + 0x1e]
    rbx = 0x0000000000000000   rbp = 0x00007ffeed518af0
    rsp = 0x00007ffeed518ac0   r12 = 0x00007ffeed519250
    r13 = 0x00000000111263d0   r14 = 0x0000000000006d62
    r15 = 0x0000000000000000   rip = 0x00000000024290ec
    Found by: call frame info
 3  libpthread-2.23.so + 0x11390
    rbx = 0x0000000000000000   rbp = 0x00007ffeed519240
    rsp = 0x00007ffeed518b00   r12 = 0x00007ffeed519250
    r13 = 0x00000000111263d0   r14 = 0x0000000000006d62
    r15 = 0x0000000000000000   rip = 0x00007f2f3f7db390
    Found by: call frame info
 4  impalad!boost::thread::join_noexcept() + 0x5c
    rbp = 0x00007ffeed519260   rsp = 0x00007ffeed519250
    rip = 0x0000000003c6a13c
    Found by: previous frame's frame pointer
...

先是OS、CPU等環境信息,然後就是各個線程的堆棧信息,每個堆棧的各個寄存器值也都顯示出來了。第一個線程是breakpad做minidump的線程,因爲Crash reason是DUMP_REQUESTED(實際上進程還活着,並沒有crash)。如果是在線上環境進程崩潰,則第一個線程會是導致崩潰的線程。

回到Coordinator卡頓的分析,我們先是從日誌找到了線程id爲20747:

I1101 09:52:17.002415 20747 Frontend.java:1256] a14a8bbbaf75bf19:63c6f76f00000000] Analyzing query: ALTER TABLE insert_inherit_permission ADD PARTITION(p1=1, p2=1, p3=1) db: default

可惜breakpad並沒有打出像gdb一樣的LWP信息,線程的編號也跟gdb的不一樣。但據觀察,線程id(即LWP id)一般會出現在線程堆棧裏前幾個調用的寄存器裏,經常可以藉此找到線程。比如我們關心的線程id是20747,轉換成16進制就是510b,在輸出文件 /tmp/resolved.txt 裏搜510b,剛好只有一個線程其堆棧裏前幾個調用的寄存器rbx值爲 0x000000000000510b,而這個線程的堆棧也正就是 ExecDdl 的堆棧:

Thread 283
...
13  impalad!impala::CatalogServiceClient::recv_ExecDdl(impala::TDdlExecResponse&) [CatalogService.cpp : 1908 + 0x1f]
    rbx = 0x0000000000000000   rbp = 0x00007f2e8f3ecf20
    rsp = 0x00007f2e8f3ece60   r12 = 0x0000000000000000
    r13 = 0x00000000110d1ba8   r14 = 0x0000000002512a7c
    r15 = 0x0000000000000000   rip = 0x0000000001c319a4
    Found by: call frame info
14  impalad!impala::CatalogServiceClientWrapper::ExecDdl(impala::TDdlExecResponse&, impala::TDdlExecRequest const&, bool*) [catalog-service-client-wrapper.h : 47 + 0x13]
    rbx = 0x0000000000000000   rbp = 0x00007f2e8f3ecf90
    rsp = 0x00007f2e8f3ecf30   r12 = 0x0000000000000000
    r13 = 0x00000000110d1ba8   r14 = 0x0000000002512a7c
    r15 = 0x0000000000000000   rip = 0x0000000002512b51
    Found by: call frame info
15  impalad!impala::Status impala::ClientConnection<impala::CatalogServiceClientWrapper>::DoRpc<void (impala::CatalogServiceClientWrapper::*)(impala::TDdlExecResponse&, impala::TDdlExecRequest const&, bool*), impala::TDdlExecRequest, impala::TDdlExecResponse>(void (impala::CatalogServiceClientWrapper::* const&)(impala::TDdlExecResponse&, impala::TDdlExecRequest const&, bool*), impala::TDdlExecRequest const&, impala::TDdlExecResponse*) [client-cache.h : 243 + 0x31]
    rbx = 0x0000000000000000   rbp = 0x00007f2e8f3ed070
    rsp = 0x00007f2e8f3ecfa0   r12 = 0x0000000000000000
    r13 = 0x00000000110d1ba8   r14 = 0x0000000002512a7c
    r15 = 0x0000000000000000   rip = 0x0000000002513982
    Found by: call frame info
...
31  impalad!boost::_bi::bind_t<void, boost::_mfi::mf2<void, impala::ThriftThread, boost::shared_ptr<apache::thrift::concurrency::Runnable>, impala::Promise<unsigned long, (impala::PromiseMode)0>*>, boost::_bi::list3<boost::_bi::value<impala::ThriftThread*>, boost::_bi::value<boost::shared_ptr<apache::thrift::concurrency::Runnable> >, boost::_bi::value<impala::Promise<unsigned long, (impala::PromiseMode)0>*> > >::operator()() [bind_template.hpp : 20 + 0x22]
    rbx = 0x000000000000510b   rbp = 0x00007f2e8f3f18a0
    rsp = 0x00007f2e8f3f1860   r12 = 0x0000000000000000
    r13 = 0x00007f2e903f9c60   r14 = 0x000000000c7b95c8
    r15 = 0x000000000c2822b0   rip = 0x0000000001fe8aa9
    Found by: call frame info
32  impalad!boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf2<void, impala::ThriftThread, boost::shared_ptr<apache::thrift::concurrency::Runnable>, impala::Promise<unsigned long, (impala::PromiseMode)0>*>, boost::_bi::list3<boost::_bi::value<impala::ThriftThread*>, boost::_bi::value<boost::shared_ptr<apache::thrift::concurrency::Runnable> >, boost::_bi::value<impala::Promise<unsigned long, (impala::PromiseMode)0>*> > >, void>::invoke(boost::detail::function::function_buffer&) [function_template.hpp : 153 + 0xc]
    rbx = 0x000000000000510b   rbp = 0x00007f2e8f3f18d0
    rsp = 0x00007f2e8f3f18b0   r12 = 0x0000000000000000
    r13 = 0x00007f2e903f9c60   r14 = 0x000000000c7b95c8
    r15 = 0x000000000c2822b0   rip = 0x0000000001fe89bc
    Found by: call frame info
33  impalad!boost::function0<void>::operator()() const [function_template.hpp : 767 + 0x11]
    rbx = 0x000000000000510b   rbp = 0x00007f2e8f3f1910
    rsp = 0x00007f2e8f3f18e0   r12 = 0x0000000000000000
    r13 = 0x00007f2e903f9c60   r14 = 0x000000000c7b95c8
    r15 = 0x000000000c2822b0   rip = 0x0000000001f04e58
    Found by: call frame info
34  impalad!impala::Thread::SuperviseThread(std::string const&, std::string const&, boost::function<void ()>, impala::ThreadDebugInfo const*, impala::Promise<long, (impala::PromiseMode)0>*) [thread.cc : 360 + 0xf]
    rbx = 0x000000000000510b   rbp = 0x00007f2e8f3f1c70
    rsp = 0x00007f2e8f3f1920   r12 = 0x0000000000000000
    r13 = 0x00007f2e903f9c60   r14 = 0x000000000c7b95c8
    r15 = 0x000000000c2822b0   rip = 0x000000000247bf47
    Found by: call frame info
...

堆棧裏同時也有代碼的行號,感興趣的朋友可以下載整個 resolved.txt 文件 (壓縮後73KB) 研究研究:https://github.com/stiga-huang/BlogFiles/blob/master/Debugging%20on%20Stuck%20Query/resolved.txt.gz

在這之後就知道Coordinator是卡在等待ExecDdl的RPC返回了,後面同樣可以對catalogd做分析。

總結

Impalad查詢卡頓時,如果日誌中無法發現異常,對於BE部分可以使用core dump或minidump來做線程堆棧分析,對於FE部分可以用jstack做分析。日誌中的線程id對應gdb中的 LWP id 和 jstack 中的 nid (16進制),在minidump中找不到明顯對應,可試圖在寄存器值中匹配線程。

(注:BE是指Backend,主要是Impala的C++部分;FE是指Frontend,主要是Impala的Java部分)

指令回顧

  • gcore [-o filename] pid
  • gdb -c core_file program_binary
  • gdb指令
    • thread find thread_id (查找對應線程id在gdb中的線程編號)
    • thread id (跳到對應編號的線程)
    • bt (打印堆棧)
  • jstack pid
  • kill -s SIGUSR1 pid (觸發impala進程生成minidump)
  • $IMPALA_HOME/bin/dump_breakpad_symbols.py -f binaries -d output_symbol_dir (生成Breakpad形式的符號文件)
  • $IMPALA_TOOLCHAIN/breakpad-$IMPALA_BREAKPAD_VERSION/bin/minidump_stackwalk $MINIDUMP_FILE $BREAKPAD_SYMBOLS > $RESULT_FILE (解析Minidump)

參考文檔

  • https://cwiki.apache.org/confluence/display/IMPALA/Debugging+Impala+Core+Dumps+on+Another+System
  • https://cwiki.apache.org/confluence/display/IMPALA/Debugging+Impala+Minidumps
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章