如何對CDH集羣中的Impala打印線程堆棧

如何對CDH集羣中的Impala打印線程堆棧

上一篇文章《Impala查詢卡頓分析案例》介紹了怎麼對Impala進程打印線程堆棧,JVM部分直接用 jstack 比較直接,但 C++ 部分由於要使用 gdb 或 breakpad 工具,還需要編譯源碼,顯得比較繁瑣。本文直接演示如何在 CDH 集羣中打印 Impala 進程的線程堆棧,不再需要編譯源碼。當然第一次操作時還是需要下載一些工具,可以在集羣中固定選一臺機器來配置環境,以後再操作時就比較方便了。

1. 生成 Minidump 文件

登上 impalad 所在機器,找到 impalad 進程ID.

$ ps aux | grep impalad
root      4374  0.0  0.0  12944   972 pts/0    S+   16:49   0:00 grep --color=auto impalad
impala   29645  1.0  3.0 2999416 231972 ?      Sl   16:17   0:20 /opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/impala/sbin-retail/impalad --flagfile=/run/cloudera-scm-agent/process/55-impala-IMPALAD/impala-conf/impalad_flags
impala   29652  0.0  0.1 197888 13556 ?        Sl   16:17   0:00 python2.7 /usr/lib/cmf/agent/build/env/bin/cmf-redactor /usr/lib/cmf/service/impala/impala.sh impalad impalad_flags false

上面進程號爲 29645 就是 impalad 進程。對它發送 SIGUSR1 信號觸發 minidump:

$ kill -s SIGUSR1 29645

在 /var/log/impalad/impalad.INFO 中可以找到:

Wrote minidump to /var/log/impala-minidumps/impalad/3745e5d7-9281-4548-2fd5b4b1-adc7f7eb.dmp

2. 生成 Breakpad symbol 文件

2.1 配置 Breakpad 工具

Impala 源碼中有一個腳本 (bin/dump_breakpad_symbols.py) 可以生成 breakpad 形式的 symbol 文件。下載對應版本的 Impala 源碼,可以在 cloudera github 的 release 頁面查找:https://github.com/cloudera/Impala/releases

本例中 CDH 版本是 5.16.2,下載並解壓 https://github.com/cloudera/Impala/archive/cdh5.16.2-release.tar.gz (大小爲 692MB)

注:cloudera impala repo很大 (15GB),如果只需要一個版本的代碼,沒必要 git clone.

wget https://github.com/cloudera/Impala/archive/cdh5.16.2-release.tar.gz
tar zxf cdh5.16.2-release.tar.gz
cd Impala-cdh5.16.2-release

爲了讓 bin/dump_breakpad_symbols.py 能運行,我們還需要配置一下環境。確保 JAVA_HOME 變量指向了正確的目錄,然後運行

# 確保 JAVA_HOME 變量有配置並指向了正確的目錄
$ export JAVA_HOME=/usr/java/jdk1.8.0_162-cloudera
$ source bin/impala-config.sh

# 國內用戶可以使用阿里雲的 python 鏡像
$ export PYPI_MIRROR="http://mirrors.aliyun.com/pypi"
$ $IMPALA_HOME/infra/python/deps/download_requirements

然後需要初始化一下toolchain裏的breakpad,使用 bin/bootstrap_toolchain.py
正常來說這個腳本會下載所有的toolchain,耗時較長,我們只需要breakpad部分,可以對 bin/boostrap_toolchain.py 作如下修改:

   # LLVM and Kudu are the largest packages. Sort them first so that
   # their download starts as soon as possible.
-  packages = map(Package, ["llvm", "kudu",
-      "avro", "binutils", "boost", "breakpad", "bzip2", "cmake", "crcutil",
-      "flatbuffers", "gcc", "gflags", "glog", "gperftools", "gtest", "libev",
-      "lz4", "openldap", "openssl", "protobuf",
-      "rapidjson", "re2", "snappy", "thrift", "tpc-h", "tpc-ds", "zlib"])
-  packages.insert(0, Package("llvm", "3.9.1-asserts"))
+  packages = map(Package, ["breakpad"])
   bootstrap(toolchain_root, packages)

即在 bootstrap_toolchain.py 的最後部分裏把其它 package 都去掉,只加上 breakpad 的。然後再執行這個腳本:

$ bin/bootstrap_toolchain.py
INFO:bootstrap_virtualenv:Creating python virtualenv
INFO:bootstrap_virtualenv:Installing packages into the virtualenv
INFO:bootstrap_virtualenv:Installing stage 2 packages into the virtualenv
2019-11-10 01:31:23,683 Thread-3 INFO: Downloading https://native-toolchain.s3.amazonaws.com/build/257-0847514126/breakpad/97a98836768f8f0154f8f86e5e14c2bb7e74132e-p2-gcc-4.9.2/breakpad-97a98836768f8f0154f8f86e5e14c2bb7e74132e-p2-gcc-4.9.2-ec2-package-ubuntu-16-04.tar.gz to /root/Impala-cdh5.16.2-release/toolchain/breakpad-97a98836768f8f0154f8f86e5e14c2bb7e74132e-p2-gcc-4.9.2-ec2-package-ubuntu-16-04.tar.gz (attempt 1)
2019-11-10 01:31:24,452 Thread-3 INFO: Extracting breakpad-97a98836768f8f0154f8f86e5e14c2bb7e74132e-p2-gcc-4.9.2-ec2-package-ubuntu-16-04.tar.gz

2.2 生成 symbol 文件

2.2.1 使用本地 parcel 裏的可執行文件

之後就可以使用 dump_breakpad_symbols.py 了,前面在用 ps 查找 impalad 進程的時候看到可執行文件是 /opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/impala/sbin-retail/impalad,對它來生成 symbol 文件,放到 /tmp/syms 目錄下:

$ bin/dump_breakpad_symbols.py -f /opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/impala/sbin-retail/impalad -d /tmp/syms
INFO:root:Processing binary file: /opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/impala/sbin-retail/impalad

2.2.2 使用 deb 安裝包裏的可執行文件

上述方式生成的 symbol 文件不帶有文件名和行號,如果想儘可能地結合代碼,可以下載並解析對應系統的 rpm/deb 包。這些包可以在 http://archive.cloudera.com 中找到,比如 cdh5 對應的 ubuntu 的包都在 http://archive.cloudera.com/cdh5/ubuntu 下。本例中使用的系統是 ubuntu16.04,各個版本的impala cdh包在 http://archive.cloudera.com/cdh5/ubuntu/xenial/amd64/cdh/pool/contrib/i/impala 下都可以找到,下載如下兩個文件:

  • 可執行文件deb包:http://archive.cloudera.com/cdh5/ubuntu/xenial/amd64/cdh/pool/contrib/i/impala/impala_2.12.0+cdh5.16.2+0-1.cdh5.16.2.p0.22~xenial-cdh5.16.2_amd64.deb (345MB)
  • 包含上述可執行文件debug信息的deb包:http://archive.cloudera.com/cdh5/ubuntu/xenial/amd64/cdh/pool/contrib/i/impala/impala-dbg_2.12.0+cdh5.16.2+0-1.cdh5.16.2.p0.22~xenial-cdh5.16.2_amd64.deb (471MB)

然後仍是使用 dump_breakpad_symbols.py:

$ bin/dump_breakpad_symbols.py -r ~/Downloads/impala_2.12.0+cdh5.16.2+0-1.cdh5.16.2.p0.22~xenial-cdh5.16.2_amd64.deb -s ~/Downloads/impala-dbg_2.12.0+cdh5.16.2+0-1.cdh5.16.2.p0.22~xenial-cdh5.16.2_amd64.deb -d /tmp/syms
INFO:root:Extracting to /tmp/tmpBDEwFI: /home/quanlong/Downloads/impala_2.12.0+cdh5.16.2+0-1.cdh5.16.2.p0.22~xenial-cdh5.16.2_amd64.deb
INFO:root:Extracting to /tmp/tmpBDEwFI: /home/quanlong/Downloads/impala-dbg_2.12.0+cdh5.16.2+0-1.cdh5.16.2.p0.22~xenial-cdh5.16.2_amd64.deb
INFO:root:Processing binary file: /tmp/tmpBDEwFI/usr/lib/impala/lib/libstdc++.so.6.0.20
INFO:root:Processing binary file: /tmp/tmpBDEwFI/usr/lib/impala/lib/libgcc_s.so.1
INFO:root:Processing binary file: /tmp/tmpBDEwFI/usr/lib/impala/lib/libkudu_client.so.0.1.0
INFO:root:Processing binary file: /tmp/tmpBDEwFI/usr/lib/impala/lib/libstdc++.so.6
INFO:root:Processing binary file: /tmp/tmpBDEwFI/usr/lib/impala/lib/libkudu_client.so.0
INFO:root:Processing binary file: /tmp/tmpBDEwFI/usr/lib/impala/lib/openssl/libssl.so
INFO:root:Processing binary file: /tmp/tmpBDEwFI/usr/lib/impala/lib/openssl/libcrypto.so
INFO:root:Processing binary file: /tmp/tmpBDEwFI/usr/lib/impala/lib/openssl/libcrypto.so.1.0.0
INFO:root:Processing binary file: /tmp/tmpBDEwFI/usr/lib/impala/lib/openssl/libssl.so.1.0.0
INFO:root:Processing binary file: /tmp/tmpBDEwFI/usr/lib/impala/sbin-debug/libfesupport.so
INFO:root:Processing binary file: /tmp/tmpBDEwFI/usr/lib/impala/sbin-debug/impalad
INFO:root:Processing binary file: /tmp/tmpBDEwFI/usr/lib/impala/sbin-retail/libfesupport.so
INFO:root:Processing binary file: /tmp/tmpBDEwFI/usr/lib/impala/sbin-retail/impalad

這樣 /tmp/syms 裏的 symbol 信息就包含文件名和行號了。

3. 使用 symbol 文件解析 minidump

使用 toolchain 裏 breakpad 目錄下的 minidump_stackwalk 工具就可以根據 symbol 文件來解析 minidump,假設把解析結果放到 /tmp/resolved.txt,把 breakpad 的日誌放到 /tmp/breakpad.log,指令如下:

$ toolchain/breakpad-97a98836768f8f0154f8f86e5e14c2bb7e74132e-p2/bin/minidump_stackwalk /var/log/impala-minidumps/impalad/3745e5d7-9281-4548-2fd5b4b1-adc7f7eb.dmp /tmp/syms > /tmp/resolved.txt 2>/tmp/breakpad.log

生成的 resolved.txt 形式如下:

Operating system: Linux
                  0.0.0 Linux 4.4.0-24-generic #43-Ubuntu SMP Wed Jun 8 19:27:37 UTC 2016 x86_64
CPU: amd64
     family 6 model 63 stepping 0
     2 CPUs

GPU: UNKNOWN

Crash reason:  DUMP_REQUESTED
Crash address: 0x217a097
Process uptime: not available

Thread 0 (crashed)
 0  impalad!google_breakpad::ExceptionHandler::WriteMinidump() + 0x57
    rax = 0x0000000002149a7e   rdx = 0x0000000000000000
    rcx = 0x000000000217a07f   rbx = 0x0000000000000000
    rsi = 0x0000000000000001   rdi = 0x00007ffed049f068
    rbp = 0x00007ffed049f770   rsp = 0x00007ffed049efd0
     r8 = 0x0000000000000000    r9 = 0x0000000000000024
    r10 = 0x0000000002288a89   r11 = 0x0000000000000000
    r12 = 0x00007ffed049f630   r13 = 0x0000000000d5cff0
    r14 = 0x0000000000000000   r15 = 0x00007ffed049f690
    rip = 0x000000000217a097
    Found by: given as instruction pointer in context
 1  impalad!google_breakpad::ExceptionHandler::WriteMinidump(std::string const&, bool (*)(google_breakpad::MinidumpDescriptor const&, void*, bool), void*) + 0xf0
    rbx = 0x00007f92561325a0   rbp = 0x00007ffed049f770
    rsp = 0x00007ffed049f620   r12 = 0x00007ffed049f630
    r13 = 0x0000000000d5cff0   r14 = 0x0000000000000000
    r15 = 0x00007ffed049f690   rip = 0x000000000217a960
    Found by: call frame info
 2  libpthread-2.23.so + 0x11390
    rbx = 0x0000000000000000   rbp = 0x00007ffed049fdd0
    rsp = 0x00007ffed049f780   r12 = 0x0000000007ada458
    r13 = 0x0000000007ada480   r14 = 0x0000000000000000
    r15 = 0x00007ffed049fdf0   rip = 0x00007f92556fe390
    Found by: call frame info
 3  impalad!boost::thread::join_noexcept() + 0x5c
    rbp = 0x00007ffed049fdf0   rsp = 0x00007ffed049fde0
    rip = 0x0000000001334cec
    Found by: previous frame's frame pointer
 4  impalad!impala::ThriftServer::Join() [thread.hpp : 767 + 0x8]
    rbx = 0x000000000648b420   rbp = 0x00007ffed049fe80
    rsp = 0x00007ffed049fe40   r12 = 0x00007f91fef44700
    r13 = 0x00007ffed049ff20   r14 = 0x0000000006acbae0
    r15 = 0x0000000000000002   rip = 0x0000000000b34f4f
    Found by: call frame info
 5  impalad!impala::ImpalaServer::Join() [impala-server.cc : 2151 + 0xc]
    rbx = 0x0000000006621800   rbp = 0x00007ffed049feb0
    rsp = 0x00007ffed049fe90   r12 = 0x00007ffed049ffb0
    r13 = 0x00007ffed049ff20   r14 = 0x0000000006acbae0
    r15 = 0x0000000000000002   rip = 0x0000000000c28f8a
    Found by: call frame info
 6  impalad!ImpaladMain(int, char**) [impalad-main.cc : 98 + 0xc]
    rbx = 0x00007ffed049ff90   rbp = 0x00007ffed04a0130
    rsp = 0x00007ffed049fec0   r12 = 0x00007ffed049ffb0
    r13 = 0x00007ffed049ff20   r14 = 0x0000000006acbae0
    r15 = 0x0000000000000002   rip = 0x0000000000c238e1
    Found by: call frame info
......

第一個線程 (Thread 0) 標記了 Crashed,但實際是在做 minidump 的線程,上面的 Crash reason 已經寫了是 DUMP_REQUESTED。實際進程 crash 時,會有具體的原因的。
解析的輸出包含了很多寄存器的值,有點影響閱讀,可以把它們去掉:

grep -v = /tmp/resolved.txt | grep -v 'Found by' | less

這樣能看到比較舒服的堆棧:

Thread 119
 0  libpthread-2.23.so + 0xd360
 1  impalad!impala::io::DiskIoMgr::WorkLoop(impala::io::DiskIoMgr::DiskQueue*) [disk-io-mgr.cc : 977 + 0x5]
 2  impalad!impala::Thread::SuperviseThread(std::string const&, std::string const&, boost::function<void ()>, impala::ThreadDebugInfo const*, impala::Promise<long>*) [function_template.hpp : 767 + 0x7]
 3  impalad!boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(std::string const&, std::string const&, boost::function<void ()>, impala::ThreadDebugInfo const*, impala::Promise<long>*), boost::_bi::list5<boost::_bi::value
<std::string>, boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::ThreadDebugInfo*>, boost::_bi::value<impala::Promise<long>*> > > >::run() [bind.hpp : 525 + 0x6]
 4  impalad!thread_proxy + 0xda
 5  libpthread-2.23.so + 0x76ba
 6  libc-2.23.so + 0x10741d

4. 操作錯誤示例

解析文件裏如果沒有函數名,則是 symbol 文件和 minidump 沒有配對上,breakpad.log 裏可能會有類似的日誌:

2019-11-09 23:57:23: minidump_processor.cc:201: INFO: Looking at thread /var/log/impala-minidumps/impalad/9e41139b-a5b1-4f94-df3da8b6-c0c66040.dmp:0/155 id 0x73cd
2019-11-09 23:57:23: minidump.cc:473: INFO: MinidumpContext: looks like AMD64 context
2019-11-09 23:57:23: minidump.cc:473: INFO: MinidumpContext: looks like AMD64 context
2019-11-09 23:57:23: simple_symbol_supplier.cc:196: INFO: No symbol file at /tmp/syms/impalad/DD8351C4C1817BE1D142C187FA70CCAC0/impalad.sym
2019-11-09 23:57:23: stackwalker.cc:103: INFO: Couldn't load symbols for: /opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/impala/sbin-retail/impalad|DD8351C4C1817BE1D142C187FA70CCAC0
2019-11-09 23:57:23: simple_symbol_supplier.cc:196: INFO: No symbol file at /tmp/syms/libpthread-2.23.so/23E017CE2254FC6511D9BC8F534BB4F00/libpthread-2.23.so.sym
2019-11-09 23:57:23: stackwalker.cc:103: INFO: Couldn't load symbols for: /lib/x86_64-linux-gnu/libpthread-2.23.so|23E017CE2254FC6511D9BC8F534BB4F00

最重要的是 “No symbol file at /tmp/syms/impalad/DD…C0/impalad.sym” 這句,表示找不到想要的 symbol 文件。查看 /tmp/syms/impalad 目錄,確實這串字符串匹配不上:

$ ls /tmp/syms/impalad/
7F9EC4C10024BDC531665853311E9CCE0

這源於我選擇了錯誤的 impalad 文件來生成 symbol,其實要選擇 impalad 進程使用文件,即 /opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/impala/sbin-retail/impalad

在 CDH parcel 目錄裏有多個 impalad 文件,切記不要選錯了:

$ find /opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8 -name impalad
/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/impala/sbin-retail/impalad
/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/impala/sbin-debug/impalad
/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/debug/usr/lib/impala/sbin-retail/impalad
/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/debug/usr/lib/impala/sbin-debug/impalad
/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/bin/impalad

可以的話還是使用 deb 包來 dump symbol,這樣得到的信息更全,詳見 2.2.2。

總結

操作步驟:

  1. 觸發 Minidump: kill -s SIGUSR1 $PID
  2. 生成 Breakpad symbol 文件:bin/dump_breakpad_symbols.py -f impalad文件 -d /tmp/syms
  3. 解析 Minidump 文件: minidump_stackwalk minidump文件 /tmp/syms > /tmp/resolved.txt 2>/tmp/breakpad.log

環境配置步驟詳見文章內容。

參考文檔

  • https://cwiki.apache.org/confluence/display/IMPALA/Debugging+Impala+Minidumps
發佈了17 篇原創文章 · 獲贊 7 · 訪問量 1萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章