記一次容器內執行ansible命令卡住

1.由來

  最近在使用kylin_v10系統,發現當在此係統下運行的容器內執行#ansible localhost -m setup 命令會卡住不動,於是和同事一起經過如下排查最終找到解決問題的辦法。

2.環境

2.1.系統信息

# cat /etc/*-release
Kylin Linux Advanced Server release V10 (Tercel)
NAME="Kylin Linux Advanced Server"
VERSION="V10 (Tercel)"
ID="kylin"
VERSION_ID="V10"
PRETTY_NAME="Kylin Linux Advanced Server V10 (Tercel)"
ANSI_COLOR="0;31"

Kylin Linux Advanced Server release V10 (Tercel)  

2.2.內核信息

# uname -a
Linux reg.wps.lan 4.19.90-17.ky10.aarch64 #1 SMP Sun Jun 28 14:27:40 CST 2020 aarch64 aarch64 aarch64 GNU/Linux

2.3. docker信息

# docker info
Containers: 1
 Running: 1
 Paused: 0
 Stopped: 0
Images: 1
Server Version: 18.09.9
Storage Driver: overlay2
 Backing Filesystem: xfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs

2.4.ansible信息

# ansible --version
ansible 2.6.2
  config file = None
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.16 (default, Jul  9 2020, 06:35:45) [GCC 7.3.0]

3.分析排查

  在排查時候發現#ansible localhost -m setup命令卡住,放將localhost換成自定義ip+賬號密碼的配置文件即可正常運行。

       於是加入export ANSIBLE_DEBUG=True用於輸出debug日誌。

       發現卡在如下地方:

    82 1606185861.10586: transferring module to remote /root/.ansible/tmp/ansible-tmp-1606185860.41-269842916667107/AnsiballZ_setup.py
    82 1606185861.10840: done transferring module to remote
    82 1606185861.10894: _low_level_execute_command(): starting
    82 1606185861.10924: _low_level_execute_command(): executing: /bin/sh -c 'chmod u+x /root/.ansible/tmp/ansible-tmp-1606185860.41-269842916667107/ /root/.ansible/tmp/ansible-tmp-1606185860.41-269842916667107/AnsiballZ_setup.py && sleep 0'
    82 1606185861.10940: in local.exec_command()
    82 1606185861.10957: opening command with Popen()
    82 1606185861.11488: done running command with Popen()
    82 1606185861.11523: getting output with communicate()
    82 1606185861.11918: done communicating
    82 1606185861.11936: done with local.exec_command()
    82 1606185861.11961: _low_level_execute_command() done: rc=0, stdout=, stderr=
    82 1606185861.11977: _low_level_execute_command(): starting
    82 1606185861.12019: _low_level_execute_command(): executing: /bin/sh -c '/usr/bin/python /root/.ansible/tmp/ansible-tmp-1606185860.41-269842916667107/AnsiballZ_setup.py && sleep 0'
    82 1606185861.12038: in local.exec_command()
    82 1606185861.12055: opening command with Popen()
    82 1606185861.12599: done running command with Popen()
    82 1606185861.12631: getting output with communicate()

  於是進到物理機上去查看ansible進程

# ps -ef |grep ansible
root      672540  672016 99 10:44 pts/0    00:03:06 /usr/bin/python /root/.ansible/tmp/ansible-tmp-1606185860.41-269842916667107/AnsiballZ_setup.py
root      673881  672428 51 10:47 pts/0    00:00:02 /usr/bin/python /usr/local/bin/ansible localhost -m setup
root      673893  673881 33 10:47 pts/0    00:00:00 /usr/bin/python /usr/local/bin/ansible localhost -m setup
root      673908  673893  0 10:47 pts/0    00:00:00 /bin/sh -c /bin/sh -c '/usr/bin/python /root/.ansible/tmp/ansible-tmp-1606186046.03-129145088760493/AnsiballZ_setup.py && sleep 0'
root      673909  673908  0 10:47 pts/0    00:00:00 /bin/sh -c /usr/bin/python /root/.ansible/tmp/ansible-tmp-1606186046.03-129145088760493/AnsiballZ_setup.py && sleep 0
root      673910  673909 23 10:47 pts/0    00:00:00 /usr/bin/python /root/.ansible/tmp/ansible-tmp-1606186046.03-129145088760493/AnsiballZ_setup.py
root      673914  673910 99 10:47 pts/0    00:00:01 /usr/bin/python /root/.ansible/tmp/ansible-tmp-1606186046.03-129145088760493/AnsiballZ_setup.py
root      673971  443741  0 10:47 pts/1    00:00:00 grep ansible

  再用strace追蹤下673914進程

# strace -p 673914
close(216995106)                        = -1 EBADF (錯誤的文件描述符)
close(216995107)                        = -1 EBADF (錯誤的文件描述符)
close(216995108)                        = -1 EBADF (錯誤的文件描述符)
close(216995109)                        = -1 EBADF (錯誤的文件描述符)
close(216995110)                        = -1 EBADF (錯誤的文件描述符)
close(216995111)                        = -1 EBADF (錯誤的文件描述符)
close(216995112)                        = -1 EBADF (錯誤的文件描述符)
close(216995113)                        = -1 EBADF (錯誤的文件描述符)
close(216995114)                        = -1 EBADF (錯誤的文件描述符)
close(216995115)                        = -1 EBADF (錯誤的文件描述符)
close(216995116)                        = -1 EBADF (錯誤的文件描述符)
close(216995117)                        = -1 EBADF (錯誤的文件描述符)
close(216995118)                        = -1 EBADF (錯誤的文件描述符)
close(216995119)                        = -1 EBADF (錯誤的文件描述符)
close(216995120)                        = -1 EBADF (錯誤的文件描述符)
close(216995121)                        = -1 EBADF (錯誤的文件描述符)
close(216995122)                        = -1 EBADF (錯誤的文件描述符)
close(216995123)                        = -1 EBADF (錯誤的文件描述符)
close(216995124)                        = -1 EBADF (錯誤的文件描述符)
close(216995125)                        = -1 EBADF (錯誤的文件描述符)
close(216995126)                        = -1 EBADF (錯誤的文件描述符)
close(216995127)                        = -1 EBADF (錯誤的文件描述符)
close(216995128)                        = -1 EBADF (錯誤的文件描述符)
close(216995129)                        = -1 EBADF (錯誤的文件描述符)
close(216995130)                        = -1 EBADF (錯誤的文件描述符)
close(216995131)                        = -1 EBADF (錯誤的文件描述符)
close(216995132)                        = -1 EBADF (錯誤的文件描述符)

  終端一直刷上面的,看樣子是文件描述符泄露,搜了下  docker Bad file descriptor,找到了 Spawning PTY processes is many times slower on Docker 18.09 裏幾位大佬排查到是容器的 nofile 太高就會卡,如果啓動容器 nofile 設置低則沒問題,

      在容器內執行ulimit  -n果然默認值很高

> ulimit -n
1073741816

      再查了下 docker nofile limit  找到 Docker: How to increase number of open files limit 裏面描述可以在run  docker的時候設置容器內的nofile參數大小。

     於是添加 --ulimit nofile=65535 重新啓動docker,並查看容器內ulimit  -n值果然變小了,而且#ansible localhost -m setup 問題也得到了解決。

4.參考

  https://github.com/pexpect/ptyprocess/issues/50
  https://github.com/docker/for-linux/issues/502
  https://github.com/moby/moby/issues/38814

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章