一次內核 crash 的排查記錄

一次內核 crash 的排查記錄

使用的發行版本是 CentOS,內核版本是 3.10.0,在正常運行的情況下內核發生了崩潰,還好有 vmcore 生成。

準備排查環境

  1. crash
  2. 內核調試信息rpm,下載的兩個 rpm 版本必須和內核版本一致

排查

準備好生成的 vmcore

  1. 進入 crash
    [zzz@localhost kernel-debug]# crash ../vmcore /usr/lib/debug/lib/modules/3.10.0-327.el7.x86_64/vmlinux
    
  2. 可以看到直接原因是訪問了空指針 (0000000000000008)
        KERNEL: /usr/lib/debug/lib/modules/3.10.0-327.el7.x86_64/vmlinux
        DUMPFILE: ../vmcore  [PARTIAL DUMP]
            CPUS: 8
            DATE: Wed Apr 29 19:40:42 2020
        UPTIME: 335 days, 01:46:01
    LOAD AVERAGE: 23.98, 26.19, 15.75
        TASKS: 688
        NODENAME: localhost.localdomain
        RELEASE: 3.10.0-327.el7.x86_64
        VERSION: #1 SMP Thu Nov 19 22:10:57 UTC 2015
        MACHINE: x86_64  (3408 Mhz)
        MEMORY: 15.6 GB
        PANIC: "BUG: unable to handle kernel NULL pointer dereference at 0000000000000008"
            PID: 76
        COMMAND: "kswapd0"
            TASK: ffff88044beba280  [THREAD_INFO: ffff88044bef0000]
            CPU: 2
        STATE: TASK_RUNNING (PANIC)
    
  3. 觀察堆棧,看代碼層面大概是哪裏產生的問題
    crash> bt
    PID: 76     TASK: ffff88044beba280  CPU: 2   COMMAND: "kswapd0"
    #0 [ffff88044bef3610] machine_kexec at ffffffff81051beb
    #1 [ffff88044bef3670] crash_kexec at ffffffff810f2542
    #2 [ffff88044bef3740] oops_end at ffffffff8163e1a8
    #3 [ffff88044bef3768] no_context at ffffffff8162e2b8
    #4 [ffff88044bef37b8] __bad_area_nosemaphore at ffffffff8162e34e
    #5 [ffff88044bef3800] bad_area_nosemaphore at ffffffff8162e4b8
    #6 [ffff88044bef3810] __do_page_fault at ffffffff81640fce
    #7 [ffff88044bef3868] do_page_fault at ffffffff81641113
    #8 [ffff88044bef3890] page_fault at ffffffff8163d408
        [exception RIP: down_read_trylock+9]
        RIP: ffffffff810aa989  RSP: ffff88044bef3940  RFLAGS: 00010202
        RAX: 0000000000000000  RBX: ffff8801b4f9ff80  RCX: 0000000000000000
        RDX: 0000000000000000  RSI: 0000000000000001  RDI: 0000000000000008
        RBP: ffff88044bef3940   R8: 0000000000000000   R9: 0000000000017bc0
        R10: ffff880465fd8000  R11: 0000000000000000  R12: ffff8801b4f9ff81
        R13: ffffea00047dfbc0  R14: 0000000000000008  R15: 0000000000000001
        ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
    #9 [ffff88044bef3948] page_lock_anon_vma_read at ffffffff811a2e65
    #10 [ffff88044bef3978] page_referenced at ffffffff811a30e7
    #11 [ffff88044bef39f0] shrink_page_list at ffffffff8117d264
    #12 [ffff88044bef3b28] shrink_inactive_list at ffffffff8117df3a
    #13 [ffff88044bef3bf0] shrink_lruvec at ffffffff8117ea05
    #14 [ffff88044bef3cf0] shrink_zone at ffffffff8117ee66
    #15 [ffff88044bef3d48] balance_pgdat at ffffffff8118010c
    #16 [ffff88044bef3e20] kswapd at ffffffff811803d3
    #17 [ffff88044bef3ec8] kthread at ffffffff810a5aef
    #18 [ffff88044bef3f50] ret_from_fork at ffffffff81645858
    
    異常發生在 down_read_trylock 函數內,後面發生了 page fault,先反彙編看一下 RIP 內地址(ffffffff810aa989)的內容:
    crash> dis -l ffffffff810aa989
        /usr/src/debug/kernel-3.10.0-327.el7/linux-3.10.0-327.el7.x86_64/arch/x86/include/asm/rwsem.h: 83
        0xffffffff810aa989 <down_read_trylock+9>:       mov    (%rdi),%rax
    
    打開這個內核版本的 arch/x86/include/asm/rwsem.h,可以使用網址 https://elixir.bootlin.com/linux/v3.10/source/arch/x86/include/asm/rwsem.h 打開,代碼如下
    /*
    * trylock for reading -- returns 1 if successful, 0 if contention
    */
    static inline int __down_read_trylock(struct rw_semaphore *sem)
    {
        long result, tmp;
        asm volatile("# beginning __down_read_trylock\n\t"
                "  mov          %0,%1\n\t"
                "1:\n\t"
                "  mov          %1,%2\n\t"
                "  add          %3,%2\n\t"
                "  jle	     2f\n\t"
                LOCK_PREFIX "  cmpxchg  %2,%0\n\t"
                "  jnz	     1b\n\t"
                "2:\n\t"
                "# ending __down_read_trylock\n\t"
                : "+m" (sem->count), "=&a" (result), "=&r" (tmp)
                : "i" (RWSEM_ACTIVE_READ_BIAS)
                : "memory", "cc");
        return result >= 0 ? 1 : 0;
    }
    
    指針 sem 也就是寄存器 RAX 的值爲 0000000000000008,地址解引用失敗,訪問空指針,引發異常。觀察堆棧調用情況,都是內存管理相關操作,其中還有一個 kswapd 調用,到此,推測是內存爆了導致的
  4. 查看當時內存使用情況,觀察內存使用到了 98%,而交換空間也被大量使用,符合上面函數調用的推導
    crash> kmem -i
                     PAGES        TOTAL      PERCENTAGE
        TOTAL MEM  3962164      15.1 GB         ----
             FREE    46698     182.4 MB    1% of TOTAL MEM
             USED  3915466      14.9 GB   98% of TOTAL MEM
           SHARED   224712     877.8 MB    5% of TOTAL MEM
          BUFFERS        0            0    0% of TOTAL MEM
           CACHED   555017       2.1 GB   14% of TOTAL MEM
             SLAB   136079     531.6 MB    3% of TOTAL MEM
    
       TOTAL HUGE        0            0         ----
        HUGE FREE        0            0    0% of TOTAL HUGE
    
       TOTAL SWAP  4194303        16 GB         ----
        SWAP USED  3042976      11.6 GB   72% of TOTAL SWAP
        SWAP FREE  1151327       4.4 GB   27% of TOTAL SWAP
    
     COMMIT LIMIT  6175385      23.6 GB         ----
        COMMITTED  9409769      35.9 GB  152% of TOTAL LIMIT
    
  5. 查看當時進程使用情況(節選),幾乎全部是頁面交換進程在運行
    crash> ps
       PID    PPID  CPU       TASK        ST  %MEM     VSZ    RSS  COMM
    >     0      0   0  ffffffff81951440  RU   0.0       0      0  [swapper/0]
    >     0      0   1  ffff88044f942e00  RU   0.0       0      0  [swapper/1]
          0      0   2  ffff88044f943980  RU   0.0       0      0  [swapper/2]
    >     0      0   3  ffff88044f944500  RU   0.0       0      0  [swapper/3]
    >     0      0   4  ffff88044f945080  RU   0.0       0      0  [swapper/4]
          0      0   5  ffff88044f945c00  RU   0.0       0      0  [swapper/5]
    >     0      0   6  ffff88044f946780  RU   0.0       0      0  [swapper/6]
    >     0      0   7  ffff88044f947300  RU   0.0       0      0  [swapper/7]
          1      0   7  ffff88044f848000  IN   0.0  189172   3080  systemd
          2      0   0  ffff88044f848b80  IN   0.0       0      0  [kthreadd]
          3      2   0  ffff88044f849700  IN   0.0       0      0  [ksoftirqd/0]
          5      2   0  ffff88044f84ae00  IN   0.0       0      0  [kworker/0:0H]
          7      2   0  ffff88044f84c500  IN   0.0       0      0  [migration/0]
          8      2   7  ffff88044f84d080  IN   0.0       0      0  [rcu_bh]
    

到此,問題可以鎖定爲是在大量內存頁交換導致的問題,但是具體的代碼邏輯並不能確定是爲何處。而當時crash的時間也是在急需內存的操作下發生了,需要避免的地方也只有關閉一些服務釋放出內存,在該操作完成後再啓動服務。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章