如何對CDH集羣中的Impala打印線程堆棧
上一篇文章《Impala查詢卡頓分析案例》介紹了怎麼對Impala進程打印線程堆棧,JVM部分直接用 jstack 比較直接,但 C++ 部分由於要使用 gdb 或 breakpad 工具,還需要編譯源碼,顯得比較繁瑣。本文直接演示如何在 CDH 集羣中打印 Impala 進程的線程堆棧,不再需要編譯源碼。當然第一次操作時還是需要下載一些工具,可以在集羣中固定選一臺機器來配置環境,以後再操作時就比較方便了。
1. 生成 Minidump 文件
登上 impalad 所在機器,找到 impalad 進程ID.
$ ps aux | grep impalad
root 4374 0.0 0.0 12944 972 pts/0 S+ 16:49 0:00 grep --color=auto impalad
impala 29645 1.0 3.0 2999416 231972 ? Sl 16:17 0:20 /opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/impala/sbin-retail/impalad --flagfile=/run/cloudera-scm-agent/process/55-impala-IMPALAD/impala-conf/impalad_flags
impala 29652 0.0 0.1 197888 13556 ? Sl 16:17 0:00 python2.7 /usr/lib/cmf/agent/build/env/bin/cmf-redactor /usr/lib/cmf/service/impala/impala.sh impalad impalad_flags false
上面進程號爲 29645 就是 impalad 進程。對它發送 SIGUSR1 信號觸發 minidump:
$ kill -s SIGUSR1 29645
在 /var/log/impalad/impalad.INFO 中可以找到:
Wrote minidump to /var/log/impala-minidumps/impalad/3745e5d7-9281-4548-2fd5b4b1-adc7f7eb.dmp
2. 生成 Breakpad symbol 文件
2.1 配置 Breakpad 工具
Impala 源碼中有一個腳本 (bin/dump_breakpad_symbols.py) 可以生成 breakpad 形式的 symbol 文件。下載對應版本的 Impala 源碼,可以在 cloudera github 的 release 頁面查找:https://github.com/cloudera/Impala/releases
本例中 CDH 版本是 5.16.2,下載並解壓 https://github.com/cloudera/Impala/archive/cdh5.16.2-release.tar.gz (大小爲 692MB)
注:cloudera impala repo很大 (15GB),如果只需要一個版本的代碼,沒必要 git clone.
wget https://github.com/cloudera/Impala/archive/cdh5.16.2-release.tar.gz
tar zxf cdh5.16.2-release.tar.gz
cd Impala-cdh5.16.2-release
爲了讓 bin/dump_breakpad_symbols.py 能運行,我們還需要配置一下環境。確保 JAVA_HOME 變量指向了正確的目錄,然後運行
# 確保 JAVA_HOME 變量有配置並指向了正確的目錄
$ export JAVA_HOME=/usr/java/jdk1.8.0_162-cloudera
$ source bin/impala-config.sh
# 國內用戶可以使用阿里雲的 python 鏡像
$ export PYPI_MIRROR="http://mirrors.aliyun.com/pypi"
$ $IMPALA_HOME/infra/python/deps/download_requirements
然後需要初始化一下toolchain裏的breakpad,使用 bin/bootstrap_toolchain.py
正常來說這個腳本會下載所有的toolchain,耗時較長,我們只需要breakpad部分,可以對 bin/boostrap_toolchain.py 作如下修改:
# LLVM and Kudu are the largest packages. Sort them first so that
# their download starts as soon as possible.
- packages = map(Package, ["llvm", "kudu",
- "avro", "binutils", "boost", "breakpad", "bzip2", "cmake", "crcutil",
- "flatbuffers", "gcc", "gflags", "glog", "gperftools", "gtest", "libev",
- "lz4", "openldap", "openssl", "protobuf",
- "rapidjson", "re2", "snappy", "thrift", "tpc-h", "tpc-ds", "zlib"])
- packages.insert(0, Package("llvm", "3.9.1-asserts"))
+ packages = map(Package, ["breakpad"])
bootstrap(toolchain_root, packages)
即在 bootstrap_toolchain.py 的最後部分裏把其它 package 都去掉,只加上 breakpad 的。然後再執行這個腳本:
$ bin/bootstrap_toolchain.py
INFO:bootstrap_virtualenv:Creating python virtualenv
INFO:bootstrap_virtualenv:Installing packages into the virtualenv
INFO:bootstrap_virtualenv:Installing stage 2 packages into the virtualenv
2019-11-10 01:31:23,683 Thread-3 INFO: Downloading https://native-toolchain.s3.amazonaws.com/build/257-0847514126/breakpad/97a98836768f8f0154f8f86e5e14c2bb7e74132e-p2-gcc-4.9.2/breakpad-97a98836768f8f0154f8f86e5e14c2bb7e74132e-p2-gcc-4.9.2-ec2-package-ubuntu-16-04.tar.gz to /root/Impala-cdh5.16.2-release/toolchain/breakpad-97a98836768f8f0154f8f86e5e14c2bb7e74132e-p2-gcc-4.9.2-ec2-package-ubuntu-16-04.tar.gz (attempt 1)
2019-11-10 01:31:24,452 Thread-3 INFO: Extracting breakpad-97a98836768f8f0154f8f86e5e14c2bb7e74132e-p2-gcc-4.9.2-ec2-package-ubuntu-16-04.tar.gz
2.2 生成 symbol 文件
2.2.1 使用本地 parcel 裏的可執行文件
之後就可以使用 dump_breakpad_symbols.py 了,前面在用 ps 查找 impalad 進程的時候看到可執行文件是 /opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/impala/sbin-retail/impalad,對它來生成 symbol 文件,放到 /tmp/syms 目錄下:
$ bin/dump_breakpad_symbols.py -f /opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/impala/sbin-retail/impalad -d /tmp/syms
INFO:root:Processing binary file: /opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/impala/sbin-retail/impalad
2.2.2 使用 deb 安裝包裏的可執行文件
上述方式生成的 symbol 文件不帶有文件名和行號,如果想儘可能地結合代碼,可以下載並解析對應系統的 rpm/deb 包。這些包可以在 http://archive.cloudera.com 中找到,比如 cdh5 對應的 ubuntu 的包都在 http://archive.cloudera.com/cdh5/ubuntu 下。本例中使用的系統是 ubuntu16.04,各個版本的impala cdh包在 http://archive.cloudera.com/cdh5/ubuntu/xenial/amd64/cdh/pool/contrib/i/impala 下都可以找到,下載如下兩個文件:
- 可執行文件deb包:http://archive.cloudera.com/cdh5/ubuntu/xenial/amd64/cdh/pool/contrib/i/impala/impala_2.12.0+cdh5.16.2+0-1.cdh5.16.2.p0.22~xenial-cdh5.16.2_amd64.deb (345MB)
- 包含上述可執行文件debug信息的deb包:http://archive.cloudera.com/cdh5/ubuntu/xenial/amd64/cdh/pool/contrib/i/impala/impala-dbg_2.12.0+cdh5.16.2+0-1.cdh5.16.2.p0.22~xenial-cdh5.16.2_amd64.deb (471MB)
然後仍是使用 dump_breakpad_symbols.py:
$ bin/dump_breakpad_symbols.py -r ~/Downloads/impala_2.12.0+cdh5.16.2+0-1.cdh5.16.2.p0.22~xenial-cdh5.16.2_amd64.deb -s ~/Downloads/impala-dbg_2.12.0+cdh5.16.2+0-1.cdh5.16.2.p0.22~xenial-cdh5.16.2_amd64.deb -d /tmp/syms
INFO:root:Extracting to /tmp/tmpBDEwFI: /home/quanlong/Downloads/impala_2.12.0+cdh5.16.2+0-1.cdh5.16.2.p0.22~xenial-cdh5.16.2_amd64.deb
INFO:root:Extracting to /tmp/tmpBDEwFI: /home/quanlong/Downloads/impala-dbg_2.12.0+cdh5.16.2+0-1.cdh5.16.2.p0.22~xenial-cdh5.16.2_amd64.deb
INFO:root:Processing binary file: /tmp/tmpBDEwFI/usr/lib/impala/lib/libstdc++.so.6.0.20
INFO:root:Processing binary file: /tmp/tmpBDEwFI/usr/lib/impala/lib/libgcc_s.so.1
INFO:root:Processing binary file: /tmp/tmpBDEwFI/usr/lib/impala/lib/libkudu_client.so.0.1.0
INFO:root:Processing binary file: /tmp/tmpBDEwFI/usr/lib/impala/lib/libstdc++.so.6
INFO:root:Processing binary file: /tmp/tmpBDEwFI/usr/lib/impala/lib/libkudu_client.so.0
INFO:root:Processing binary file: /tmp/tmpBDEwFI/usr/lib/impala/lib/openssl/libssl.so
INFO:root:Processing binary file: /tmp/tmpBDEwFI/usr/lib/impala/lib/openssl/libcrypto.so
INFO:root:Processing binary file: /tmp/tmpBDEwFI/usr/lib/impala/lib/openssl/libcrypto.so.1.0.0
INFO:root:Processing binary file: /tmp/tmpBDEwFI/usr/lib/impala/lib/openssl/libssl.so.1.0.0
INFO:root:Processing binary file: /tmp/tmpBDEwFI/usr/lib/impala/sbin-debug/libfesupport.so
INFO:root:Processing binary file: /tmp/tmpBDEwFI/usr/lib/impala/sbin-debug/impalad
INFO:root:Processing binary file: /tmp/tmpBDEwFI/usr/lib/impala/sbin-retail/libfesupport.so
INFO:root:Processing binary file: /tmp/tmpBDEwFI/usr/lib/impala/sbin-retail/impalad
這樣 /tmp/syms 裏的 symbol 信息就包含文件名和行號了。
3. 使用 symbol 文件解析 minidump
使用 toolchain 裏 breakpad 目錄下的 minidump_stackwalk 工具就可以根據 symbol 文件來解析 minidump,假設把解析結果放到 /tmp/resolved.txt,把 breakpad 的日誌放到 /tmp/breakpad.log,指令如下:
$ toolchain/breakpad-97a98836768f8f0154f8f86e5e14c2bb7e74132e-p2/bin/minidump_stackwalk /var/log/impala-minidumps/impalad/3745e5d7-9281-4548-2fd5b4b1-adc7f7eb.dmp /tmp/syms > /tmp/resolved.txt 2>/tmp/breakpad.log
生成的 resolved.txt 形式如下:
Operating system: Linux
0.0.0 Linux 4.4.0-24-generic #43-Ubuntu SMP Wed Jun 8 19:27:37 UTC 2016 x86_64
CPU: amd64
family 6 model 63 stepping 0
2 CPUs
GPU: UNKNOWN
Crash reason: DUMP_REQUESTED
Crash address: 0x217a097
Process uptime: not available
Thread 0 (crashed)
0 impalad!google_breakpad::ExceptionHandler::WriteMinidump() + 0x57
rax = 0x0000000002149a7e rdx = 0x0000000000000000
rcx = 0x000000000217a07f rbx = 0x0000000000000000
rsi = 0x0000000000000001 rdi = 0x00007ffed049f068
rbp = 0x00007ffed049f770 rsp = 0x00007ffed049efd0
r8 = 0x0000000000000000 r9 = 0x0000000000000024
r10 = 0x0000000002288a89 r11 = 0x0000000000000000
r12 = 0x00007ffed049f630 r13 = 0x0000000000d5cff0
r14 = 0x0000000000000000 r15 = 0x00007ffed049f690
rip = 0x000000000217a097
Found by: given as instruction pointer in context
1 impalad!google_breakpad::ExceptionHandler::WriteMinidump(std::string const&, bool (*)(google_breakpad::MinidumpDescriptor const&, void*, bool), void*) + 0xf0
rbx = 0x00007f92561325a0 rbp = 0x00007ffed049f770
rsp = 0x00007ffed049f620 r12 = 0x00007ffed049f630
r13 = 0x0000000000d5cff0 r14 = 0x0000000000000000
r15 = 0x00007ffed049f690 rip = 0x000000000217a960
Found by: call frame info
2 libpthread-2.23.so + 0x11390
rbx = 0x0000000000000000 rbp = 0x00007ffed049fdd0
rsp = 0x00007ffed049f780 r12 = 0x0000000007ada458
r13 = 0x0000000007ada480 r14 = 0x0000000000000000
r15 = 0x00007ffed049fdf0 rip = 0x00007f92556fe390
Found by: call frame info
3 impalad!boost::thread::join_noexcept() + 0x5c
rbp = 0x00007ffed049fdf0 rsp = 0x00007ffed049fde0
rip = 0x0000000001334cec
Found by: previous frame's frame pointer
4 impalad!impala::ThriftServer::Join() [thread.hpp : 767 + 0x8]
rbx = 0x000000000648b420 rbp = 0x00007ffed049fe80
rsp = 0x00007ffed049fe40 r12 = 0x00007f91fef44700
r13 = 0x00007ffed049ff20 r14 = 0x0000000006acbae0
r15 = 0x0000000000000002 rip = 0x0000000000b34f4f
Found by: call frame info
5 impalad!impala::ImpalaServer::Join() [impala-server.cc : 2151 + 0xc]
rbx = 0x0000000006621800 rbp = 0x00007ffed049feb0
rsp = 0x00007ffed049fe90 r12 = 0x00007ffed049ffb0
r13 = 0x00007ffed049ff20 r14 = 0x0000000006acbae0
r15 = 0x0000000000000002 rip = 0x0000000000c28f8a
Found by: call frame info
6 impalad!ImpaladMain(int, char**) [impalad-main.cc : 98 + 0xc]
rbx = 0x00007ffed049ff90 rbp = 0x00007ffed04a0130
rsp = 0x00007ffed049fec0 r12 = 0x00007ffed049ffb0
r13 = 0x00007ffed049ff20 r14 = 0x0000000006acbae0
r15 = 0x0000000000000002 rip = 0x0000000000c238e1
Found by: call frame info
......
第一個線程 (Thread 0) 標記了 Crashed,但實際是在做 minidump 的線程,上面的 Crash reason 已經寫了是 DUMP_REQUESTED。實際進程 crash 時,會有具體的原因的。
解析的輸出包含了很多寄存器的值,有點影響閱讀,可以把它們去掉:
grep -v = /tmp/resolved.txt | grep -v 'Found by' | less
這樣能看到比較舒服的堆棧:
Thread 119
0 libpthread-2.23.so + 0xd360
1 impalad!impala::io::DiskIoMgr::WorkLoop(impala::io::DiskIoMgr::DiskQueue*) [disk-io-mgr.cc : 977 + 0x5]
2 impalad!impala::Thread::SuperviseThread(std::string const&, std::string const&, boost::function<void ()>, impala::ThreadDebugInfo const*, impala::Promise<long>*) [function_template.hpp : 767 + 0x7]
3 impalad!boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(std::string const&, std::string const&, boost::function<void ()>, impala::ThreadDebugInfo const*, impala::Promise<long>*), boost::_bi::list5<boost::_bi::value
<std::string>, boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::ThreadDebugInfo*>, boost::_bi::value<impala::Promise<long>*> > > >::run() [bind.hpp : 525 + 0x6]
4 impalad!thread_proxy + 0xda
5 libpthread-2.23.so + 0x76ba
6 libc-2.23.so + 0x10741d
4. 操作錯誤示例
解析文件裏如果沒有函數名,則是 symbol 文件和 minidump 沒有配對上,breakpad.log 裏可能會有類似的日誌:
2019-11-09 23:57:23: minidump_processor.cc:201: INFO: Looking at thread /var/log/impala-minidumps/impalad/9e41139b-a5b1-4f94-df3da8b6-c0c66040.dmp:0/155 id 0x73cd
2019-11-09 23:57:23: minidump.cc:473: INFO: MinidumpContext: looks like AMD64 context
2019-11-09 23:57:23: minidump.cc:473: INFO: MinidumpContext: looks like AMD64 context
2019-11-09 23:57:23: simple_symbol_supplier.cc:196: INFO: No symbol file at /tmp/syms/impalad/DD8351C4C1817BE1D142C187FA70CCAC0/impalad.sym
2019-11-09 23:57:23: stackwalker.cc:103: INFO: Couldn't load symbols for: /opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/impala/sbin-retail/impalad|DD8351C4C1817BE1D142C187FA70CCAC0
2019-11-09 23:57:23: simple_symbol_supplier.cc:196: INFO: No symbol file at /tmp/syms/libpthread-2.23.so/23E017CE2254FC6511D9BC8F534BB4F00/libpthread-2.23.so.sym
2019-11-09 23:57:23: stackwalker.cc:103: INFO: Couldn't load symbols for: /lib/x86_64-linux-gnu/libpthread-2.23.so|23E017CE2254FC6511D9BC8F534BB4F00
最重要的是 “No symbol file at /tmp/syms/impalad/DD…C0/impalad.sym” 這句,表示找不到想要的 symbol 文件。查看 /tmp/syms/impalad 目錄,確實這串字符串匹配不上:
$ ls /tmp/syms/impalad/
7F9EC4C10024BDC531665853311E9CCE0
這源於我選擇了錯誤的 impalad 文件來生成 symbol,其實要選擇 impalad 進程使用文件,即 /opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/impala/sbin-retail/impalad
在 CDH parcel 目錄裏有多個 impalad 文件,切記不要選錯了:
$ find /opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8 -name impalad
/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/impala/sbin-retail/impalad
/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/impala/sbin-debug/impalad
/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/debug/usr/lib/impala/sbin-retail/impalad
/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/debug/usr/lib/impala/sbin-debug/impalad
/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/bin/impalad
可以的話還是使用 deb 包來 dump symbol,這樣得到的信息更全,詳見 2.2.2。
總結
操作步驟:
- 觸發 Minidump: kill -s SIGUSR1 $PID
- 生成 Breakpad symbol 文件:bin/dump_breakpad_symbols.py -f impalad文件 -d /tmp/syms
- 解析 Minidump 文件: minidump_stackwalk minidump文件 /tmp/syms > /tmp/resolved.txt 2>/tmp/breakpad.log
環境配置步驟詳見文章內容。
參考文檔
- https://cwiki.apache.org/confluence/display/IMPALA/Debugging+Impala+Minidumps