一個進入容器後curl的不對的問題診斷

一個容器,進入容器的時候是否開啓gpu,會導致 curl 的行爲不一致。具體表現爲

容器開啓 --gpus all 後進入容器,執行 curl 會出現“curl: symbol lookup error: curl: undefined symbol: curl_mime_free” 錯誤

診斷中,我先比對了兩個 --version 是否一致。
開啓前和開啓後的版本信息裏有一行不一致:
開啓前:

Protocols: dict file ftp ftps gopher http https imap imaps pop3 pop3s rtsp smb smbs smtp smtps telnet tftp

開啓後:

Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtmp rtsp smb smbs smtp smtps telnet tftp

我一開始以爲是否是環境變量不同導致的,查看了環境變量,開啓前和開啓後差了一個

NVIDIA_VISIBLE_DEVICES=all

把這個環境變量通過unset NVIDIA_VISIBLE_DEVICES 也不解決問題,看着不是環境變量問題。

接着,通過 which curl 比對,發現開啓前和開啓後的curl的位置不一樣:

開啓前:/usr/bin/curl
開啓後:/usr/local/bin/curl

兩個curl的鏈接庫也差異很大(使用 ldd /usr/bin/curlldd /usr/local/bin/curl 查看)。

前者:

linux-vdso.so.1 =>  (0x00007ffe2c56e000)
libcurl.so.4 => /usr/local/lib/libcurl.so.4 (0x00007fc68b1b7000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fc68af9a000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc68abd0000)
libssl.so.1.0.0 => /lib/x86_64-linux-gnu/libssl.so.1.0.0 (0x00007fc68a968000)
libcrypto.so.1.0.0 => /lib/x86_64-linux-gnu/libcrypto.so.1.0.0 (0x00007fc68a523000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fc68a309000)
/lib64/ld-linux-x86-64.so.2 (0x00007fc68b42e000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fc68a105000)

後者:

linux-vdso.so.1 =>  (0x00007ffceeb66000)
libcurl.so.4 => /usr/lib/x86_64-linux-gnu/libcurl.so.4 (0x00007ffad7e52000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007ffad7c35000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ffad786b000)
libidn.so.11 => /usr/lib/x86_64-linux-gnu/libidn.so.11 (0x00007ffad7638000)
librtmp.so.1 => /usr/lib/x86_64-linux-gnu/librtmp.so.1 (0x00007ffad741c000)
libssl.so.1.0.0 => /lib/x86_64-linux-gnu/libssl.so.1.0.0 (0x00007ffad71b4000)
libcrypto.so.1.0.0 => /lib/x86_64-linux-gnu/libcrypto.so.1.0.0 (0x00007ffad6d6f000)
libgssapi_krb5.so.2 => /usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2 (0x00007ffad6b25000)
liblber-2.4.so.2 => /usr/lib/x86_64-linux-gnu/liblber-2.4.so.2 (0x00007ffad6916000)
libldap_r-2.4.so.2 => /usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2 (0x00007ffad66c5000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007ffad64ab000)
/lib64/ld-linux-x86-64.so.2 (0x00007ffad80c4000)
libgnutls.so.30 => /usr/lib/x86_64-linux-gnu/libgnutls.so.30 (0x00007ffad617a000)
libhogweed.so.4 => /usr/lib/x86_64-linux-gnu/libhogweed.so.4 (0x00007ffad5f47000)
libnettle.so.6 => /usr/lib/x86_64-linux-gnu/libnettle.so.6 (0x00007ffad5d11000)
libgmp.so.10 => /usr/lib/x86_64-linux-gnu/libgmp.so.10 (0x00007ffad5a91000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ffad588d000)
libkrb5.so.3 => /usr/lib/x86_64-linux-gnu/libkrb5.so.3 (0x00007ffad55bb000)
libk5crypto.so.3 => /usr/lib/x86_64-linux-gnu/libk5crypto.so.3 (0x00007ffad538c000)
libcom_err.so.2 => /lib/x86_64-linux-gnu/libcom_err.so.2 (0x00007ffad5188000)
libkrb5support.so.0 => /usr/lib/x86_64-linux-gnu/libkrb5support.so.0 (0x00007ffad4f7d000)
libresolv.so.2 => /lib/x86_64-linux-gnu/libresolv.so.2 (0x00007ffad4d62000)
libsasl2.so.2 => /usr/lib/x86_64-linux-gnu/libsasl2.so.2 (0x00007ffad4b47000)
libgssapi.so.3 => /usr/lib/x86_64-linux-gnu/libgssapi.so.3 (0x00007ffad4906000)
libp11-kit.so.0 => /usr/lib/x86_64-linux-gnu/libp11-kit.so.0 (0x00007ffad46a2000)
libtasn1.so.6 => /usr/lib/x86_64-linux-gnu/libtasn1.so.6 (0x00007ffad448f000)
libkeyutils.so.1 => /lib/x86_64-linux-gnu/libkeyutils.so.1 (0x00007ffad428b000)
libheimntlm.so.0 => /usr/lib/x86_64-linux-gnu/libheimntlm.so.0 (0x00007ffad4082000)
libkrb5.so.26 => /usr/lib/x86_64-linux-gnu/libkrb5.so.26 (0x00007ffad3df8000)
libasn1.so.8 => /usr/lib/x86_64-linux-gnu/libasn1.so.8 (0x00007ffad3b56000)
libhcrypto.so.4 => /usr/lib/x86_64-linux-gnu/libhcrypto.so.4 (0x00007ffad3923000)
libroken.so.18 => /usr/lib/x86_64-linux-gnu/libroken.so.18 (0x00007ffad370d000)
libffi.so.6 => /usr/lib/x86_64-linux-gnu/libffi.so.6 (0x00007ffad3505000)
libwind.so.0 => /usr/lib/x86_64-linux-gnu/libwind.so.0 (0x00007ffad32dc000)
libheimbase.so.1 => /usr/lib/x86_64-linux-gnu/libheimbase.so.1 (0x00007ffad30cd000)
libhx509.so.5 => /usr/lib/x86_64-linux-gnu/libhx509.so.5 (0x00007ffad2e82000)
libsqlite3.so.0 => /usr/lib/x86_64-linux-gnu/libsqlite3.so.0 (0x00007ffad2bad000)
libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x00007ffad2975000)

我看了下容器的 PATH,裏面只有指定了 /usr/local/bin,沒有設置 /usr/bin 比 /usr/local/bin 優先級更高。

臨時解決方法也很簡單,在Python腳本里臨時 patch下:

import os

def curl_patch():

	curl_patch_cmds = [
	    'mv -f /usr/local/bin/curl /usr/local/bin/curl.bak',
	    'ln -s /usr/bin/curl /usr/local/bin/curl'
	]

	for curl_patch_cmd in curl_patch_cmds:
	    ret = os.system(curl_patch_cmd)
	    logger.info('curl_patch_cmd:', curl_patch_cmd, 'ret:', ret)

這個問題的本質是對基礎設施的路徑管理問題。混合容器後就變成一個疑難問題了。

--end--

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章