k8s 中一容器 始終佔用顯卡不釋放,相關佔用顯卡進程已 kill
通過dmesg 查看 報 Unable to allocate memory on node -1 ,治標不治本的辦法 重啓對應的容器
通過搜索 要最終解決該問題, 當前系統內核 4.4.0-xxxx 該版本問題,導致k8s上出問題,
解決辦法升級ubuntu 系統內核,該內核升級不要手動隨意下載一高版本deb安裝, 通過相關命令升級
anon:0KB active_anon:11652KB inactive_file:516KB active_file:180KB unevictable:0KB
[233318.319275] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
[233318.319513] [45521] 0 45521 255 1 5 2 0 -998 pause
[233318.319520] [46984] 0 46984 359100 2503 82 6 0 -998 flanneld
[233318.319527] [ 6504] 0 6504 2550 191 10 3 0 -998 iptables
[233318.319569] [ 6631] 0 6631 302 6 4 3 0 -998 iptables
[233318.319583] Memory cgroup out of memory: Kill process 45521 (pause) score 0 or sacrifice child
[233318.321901] Killed process 45521 (pause) total-vm:1020kB, anon-rss:4kB, file-rss:0kB
[233335.103348] NVRM: RmInitAdapter failed! (0x26:0x65:1106)
[233335.103397] NVRM: rm_init_adapter failed for device bearing minor number 5
[233395.716627] SLUB: Unable to allocate memory on node -1 (gfp=0x2088020)
[233395.716635] cache: mnt_cache(14988:e196f5f19fcea94079334d52d6fbb730dc94693de78a9902be307037e5eb5a0c), object size: 384, buffer size: 384, default order: 2, min order: 0
[233395.716639] node 0: slabs: 18, objs: 756, free: 0
[233395.716641] node 1: slabs: 8, objs: 336, free: 0
[233429.318915] NVRM: RmInitAdapter failed! (0x26:0x65:1106)
[233429.318990] NVRM: rm_init_adapter failed for device bearing minor number 5