Linux服務器宕機案例

Linux服務器宕機案例一則

案例環境

  • 操作系統 :Oracle Linux Server release 5.7 64bit 虛擬機
  • 硬件配置 : 物理機型號爲DELL R720
  • 資源配置 :RAM 8G Intel(R) Xeon(R) CPU E5-2690 8核

案例描述

早晨發現桂林那邊一臺Linux服務器(虛擬機)網絡無法ping通,於是聯繫那邊的系統管理員通過Lync共享桌面給我,通過他的電腦VMware vSphere Client登錄後,發現在控制檯亦無響應。無法登錄、無法操作,輸入操作無響應。也就是說系統宕機了。沒有辦法,只能在虛擬機“電源”選項,通過“關閉電源”、“打開電源”選項重啓Linux服務器,然後重啓了Tomcat服務和Oracle數據庫服務。檢查了Oracle數據庫的告警日誌,沒有發現任何錯誤。我的領導通過分析Linux系統日誌後,發現在8月1號晚上2:22左右出現,出現內存不足,Linux出於保護機制,把一些無關緊要的進程殺掉。具體錯誤信息如下所示(服務器名稱做了下混淆)

Aug  1 01:36:09 G*******LNX01 ntpd[3555]: kernel time sync enabled 4001
Aug  1 01:53:13 G*******LNX01 ntpd[3555]: kernel time sync enabled 0001
Aug  1 02:22:36 G*******LNX01 kernel: hald invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0
Aug  1 02:22:37 G*******LNX01 kernel: hald cpuset=/ mems_allowed=0
Aug  1 02:22:37 G*******LNX01 kernel: Pid: 3408, comm: hald Not tainted 2.6.32-200.13.1.el5uek #1
Aug  1 02:22:37 G*******LNX01 kernel: Call Trace:
Aug  1 02:22:37 G*******LNX01 kernel:  [<ffffffff810a0b66>] ? cpuset_print_task_mems_allowed+0x92/0x9e
Aug  1 02:22:37 G*******LNX01 kernel:  [<ffffffff810d9ae6>] oom_kill_process+0x85/0x25b
Aug  1 02:22:37 G*******LNX01 kernel:  [<ffffffff810d9fbc>] ? select_bad_process+0xbc/0x102
Aug  1 02:22:37 G*******LNX01 kernel:  [<ffffffff810da03f>] __out_of_memory+0x3d/0x86
Aug  1 02:22:37 G*******LNX01 kernel:  [<ffffffff810da30f>] out_of_memory+0xfc/0x195
Aug  1 02:22:37 G*******LNX01 kernel:  [<ffffffff810dd75e>] __alloc_pages_nodemask+0x487/0x595
Aug  1 02:22:37 G*******LNX01 kernel:  [<ffffffff811075ac>] alloc_page_vma+0xb9/0xc8
Aug  1 02:22:37 G*******LNX01 kernel:  [<ffffffff810ff0a7>] read_swap_cache_async+0x52/0xf1
Aug  1 02:22:37 G*******LNX01 kernel:  [<ffffffff810ff1a3>] swapin_readahead+0x5d/0x9c
Aug  1 02:22:38 G*******LNX01 kernel:  [<ffffffff810d725a>] ? find_get_page+0x22/0x69
Aug  1 02:22:38 G*******LNX01 kernel:  [<ffffffff810f1ea3>] handle_mm_fault+0x44b/0x80f
Aug  1 02:22:40 G*******LNX01 kernel:  [<ffffffff81043696>] ? should_resched+0xe/0x2f
Aug  1 02:22:40 G*******LNX01 kernel:  [<ffffffff81456006>] do_page_fault+0x210/0x299
Aug  1 02:22:40 G*******LNX01 kernel:  [<ffffffff81453fd5>] page_fault+0x25/0x30
Aug  1 02:22:40 G*******LNX01 kernel: Mem-Info:
Aug  1 02:22:43 G*******LNX01 kernel: Node 0 DMA per-cpu:
Aug  1 02:22:44 G*******LNX01 kernel: CPU    0: hi:    0, btch:   1 usd:   0
Aug  1 02:22:44 G*******LNX01 kernel: CPU    1: hi:    0, btch:   1 usd:   0
Aug  1 08:51:04 G*******LNX01 syslogd 1.4.1: restart.
Aug  1 08:51:04 G*******LNX01 kernel: klogd 1.4.1, log source = /proc/kmsg started.
Aug  1 08:51:04 G*******LNX01 kernel: Initializing cgroup subsys cpuset
Aug  1 08:51:04 G*******LNX01 kernel: Initializing cgroup subsys cpu

OOM Killer,說白了 OOM Killer 就是一層保護機制,用於避免在內存不足的時候不至於出現嚴重問題,把一些無關的進程優先殺掉,即在內存嚴重不足時,系統爲了繼續運轉,內核會挑選一個進程,將其殺掉,以釋放內存,緩解內存不足情況,不過這種保護是有限的,不能完全的保護進程的運行。

但是這個時間點是發生在凌晨2點多,於是我繼續檢查/var/log/messages日誌信息,發現系統啓動時出現了

“Phoenix BIOS detected: BIOS may corrupt low RAM, working around it”錯誤。於是google搜索這個錯誤信息。

Aug  1 08:51:04 G*******LNX01 syslogd 1.4.1: restart.
Aug  1 08:51:04 G*******LNX01 kernel: klogd 1.4.1, log source = /proc/kmsg started.
Aug  1 08:51:04 G*******LNX01 kernel: Initializing cgroup subsys cpuset
Aug  1 08:51:04 G*******LNX01 kernel: Initializing cgroup subsys cpu
Aug  1 08:51:04 G*******LNX01 kernel: Linux version 2.6.32-200.13.1.el5uek ([email protected]) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-50)) #1 SMP Wed Jul 27 2
1:02:33 EDT 2011
Aug  1 08:51:04 G*******LNX01 kernel: Command line: ro root=/dev/VolGroup00/LogVol00 rhgb quiet
Aug  1 08:51:04 G*******LNX01 kernel: KERNEL supported cpus:
Aug  1 08:51:04 G*******LNX01 kernel:   Intel GenuineIntel
Aug  1 08:51:04 G*******LNX01 kernel:   AMD AuthenticAMD
Aug  1 08:51:04 G*******LNX01 kernel:   Centaur CentaurHauls
Aug  1 08:51:04 G*******LNX01 kernel: BIOS-provided physical RAM map:
Aug  1 08:51:04 G*******LNX01 kernel:  BIOS-e820: 0000000000000000 - 000000000009f800 (usable)
Aug  1 08:51:04 G*******LNX01 kernel:  BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
Aug  1 08:51:04 G*******LNX01 kernel:  BIOS-e820: 00000000000ca000 - 00000000000cc000 (reserved)
Aug  1 08:51:04 G*******LNX01 kernel:  BIOS-e820: 00000000000dc000 - 0000000000100000 (reserved)
Aug  1 08:51:04 G*******LNX01 kernel:  BIOS-e820: 0000000000100000 - 00000000bfee0000 (usable)
Aug  1 08:51:04 G*******LNX01 kernel:  BIOS-e820: 00000000bfee0000 - 00000000bfeff000 (ACPI data)
Aug  1 08:51:04 G*******LNX01 kernel:  BIOS-e820: 00000000bfeff000 - 00000000bff00000 (ACPI NVS)
Aug  1 08:51:04 G*******LNX01 kernel:  BIOS-e820: 00000000bff00000 - 00000000c0000000 (usable)
Aug  1 08:51:04 G*******LNX01 kernel:  BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
Aug  1 08:51:04 G*******LNX01 kernel:  BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
Aug  1 08:51:04 G*******LNX01 kernel:  BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
Aug  1 08:51:04 G*******LNX01 kernel:  BIOS-e820: 00000000fffe0000 - 0000000100000000 (reserved)
Aug  1 08:51:04 G*******LNX01 kernel:  BIOS-e820: 0000000100000000 - 0000000240000000 (usable)
Aug  1 08:51:04 G*******LNX01 kernel: DMI present.
Aug  1 08:51:04 G*******LNX01 kernel: Phoenix BIOS detected: BIOS may corrupt low RAM, working around it.
Aug  1 08:51:04 G*******LNX01 kernel: last_pfn = 0x240000 max_arch_pfn = 0x400000000
Aug  1 08:51:04 G*******LNX01 kernel: x86 PAT enabled: cpu 0, old 0x0, new 0x7010600070106
Aug  1 08:51:04 G*******LNX01 kernel: total RAM covered: 8192M
Aug  1 08:51:04 G*******LNX01 kernel: Found optimal setting for mtrr clean up
Aug  1 08:51:04 G*******LNX01 kernel:  gran_size: 64K  chunk_size: 64K         num_reg: 4      lose cover RAM: 0G
Aug  1 08:51:04 G*******LNX01 kernel: x86 PAT enabled: cpu 0, old 0x0, new 0x7010600070106
Aug  1 08:51:04 G*******LNX01 kernel: last_pfn = 0xc0000 max_arch_pfn = 0x400000000
Aug  1 08:51:04 G*******LNX01 kernel: init_memory_mapping: 0000000000000000-00000000c0000000
Aug  1 08:51:04 G*******LNX01 kernel: init_memory_mapping: 0000000100000000-0000000240000000
Aug  1 08:51:04 G*******LNX01 kernel: RAMDISK: 37c4d000 - 37fef894
Aug  1 08:51:04 G*******LNX01 kernel: ACPI: RSDP 00000000000f6940 00024 (v02 PTLTD )
Aug  1 08:51:04 G*******LNX01 kernel: ACPI: XSDT 00000000bfeefddc 0005C (v01 INTEL  440BX    06040000 VMW  01324272)
Aug  1 08:51:04 G*******LNX01 kernel: ACPI: FACP 00000000bfefee98 000F4 (v04 INTEL  440BX    06040000 PTL  000F4240)
Aug  1 08:51:04 G*******LNX01 kernel: ACPI: DSDT 00000000bfef0230 0EC68 (v01 PTLTD  Custom   06040000 MSFT 03000001)
Aug  1 08:51:04 G*******LNX01 kernel: ACPI: FACS 00000000bfefffc0 00040
Aug  1 08:51:04 G*******LNX01 kernel: ACPI: BOOT 00000000bfef0208 00028 (v01 PTLTD  $SBFTBL$ 06040000  LTP 00000001)
Aug  1 08:51:04 G*******LNX01 kernel: ACPI: APIC 00000000bfef0156 000B2 (v01 PTLTD  ? APIC   06040000  LTP 00000000)
Aug  1 08:51:04 G*******LNX01 kernel: ACPI: MCFG 00000000bfef011a 0003C (v01 PTLTD  $PCITBL$ 06040000  LTP 00000001)
Aug  1 08:51:04 G*******LNX01 kernel: ACPI: SRAT 00000000bfeefed8 00128 (v02 VMWARE MEMPLUG  06040000 VMW  00000001)
Aug  1 08:51:04 G*******LNX01 kernel: ACPI: HPET 00000000bfeefea0 00038 (v01 VMWARE VMW HPET 06040000 VMW  00000001)
Aug  1 08:51:04 G*******LNX01 kernel: ACPI: WAET 00000000bfeefe78 00028 (v01 VMWARE VMW WAET 06040000 VMW  00000001)
Aug  1 08:51:04 G*******LNX01 kernel: SRAT: PXM 0 -> APIC 0 -> Node 0
Aug  1 08:51:04 G*******LNX01 kernel: SRAT: PXM 0 -> APIC 1 -> Node 0
Aug  1 08:51:04 G*******LNX01 kernel: SRAT: PXM 0 -> APIC 2 -> Node 0
Aug  1 08:51:04 G*******LNX01 kernel: SRAT: PXM 0 -> APIC 3 -> Node 0
Aug  1 08:51:04 G*******LNX01 kernel: SRAT: PXM 0 -> APIC 4 -> Node 0
Aug  1 08:51:04 G*******LNX01 kernel: SRAT: PXM 0 -> APIC 5 -> Node 0
Aug  1 08:51:04 G*******LNX01 kernel: SRAT: PXM 0 -> APIC 6 -> Node 0
Aug  1 08:51:04 G*******LNX01 kernel: SRAT: PXM 0 -> APIC 7 -> Node 0
Aug  1 08:51:04 G*******LNX01 kernel: SRAT: Node 0 PXM 0 0-a0000
Aug  1 08:51:04 G*******LNX01 kernel: SRAT: Node 0 PXM 0 100000-c0000000
Aug  1 08:51:04 G*******LNX01 kernel: SRAT: Node 0 PXM 0 100000000-240000000
Aug  1 08:51:04 G*******LNX01 kernel: Bootmem setup node 0 0000000000000000-0000000240000000
Aug  1 08:51:04 G*******LNX01 kernel:   NODE_DATA [000000000001b840 - 000000000003183f]
Aug  1 08:51:04 G*******LNX01 kernel:   bootmap [0000000000032000 -  0000000000079fff] pages 48
Aug  1 08:51:04 G*******LNX01 kernel: (9 early reservations) ==> bootmem [0000000000 - 0240000000]
Aug  1 08:51:04 G*******LNX01 kernel:   #0 [0000000000 - 0000001000]   BIOS data page ==> [0000000000 - 0000001000]
Aug  1 08:51:04 G*******LNX01 kernel:   #1 [0000006000 - 0000008000]       TRAMPOLINE ==> [0000006000 - 0000008000]
Aug  1 08:51:04 G*******LNX01 kernel:   #2 [0001000000 - 000224b73c]    TEXT DATA BSS ==> [0001000000 - 000224b73c]
Aug  1 08:51:04 G*******LNX01 kernel:   #3 [0037c4d000 - 0037fef894]          RAMDISK ==> [0037c4d000 - 0037fef894]
Aug  1 08:51:04 G*******LNX01 kernel:   #4 [000009f800 - 0000100000]    BIOS reserved ==> [000009f800 - 0000100000]
Aug  1 08:51:04 G*******LNX01 kernel:   #5 [000224c000 - 000224c1e8]              BRK ==> [000224c000 - 000224c1e8]
Aug  1 08:51:04 G*******LNX01 kernel:   #6 [0000010000 - 0000012000]          PGTABLE ==> [0000010000 - 0000012000]
Aug  1 08:51:04 G*******LNX01 kernel:   #7 [0000012000 - 0000017000]          PGTABLE ==> [0000012000 - 0000017000]
Aug  1 08:51:04 G*******LNX01 kernel:   #8 [0000017000 - 000001b840]       MEMNODEMAP ==> [0000017000 - 000001b840]
Aug  1 08:51:04 G*******LNX01 kernel: found SMP MP-table at [ffff8800000f69b0] f69b0
Aug  1 08:51:04 G*******LNX01 kernel: Zone PFN ranges:
Aug  1 08:51:04 G*******LNX01 kernel:   DMA      0x00000010 -> 0x00001000
Aug  1 08:51:04 G*******LNX01 kernel:   DMA32    0x00001000 -> 0x00100000
Aug  1 08:51:04 G*******LNX01 kernel:   Normal   0x00100000 -> 0x00240000
Aug  1 08:51:04 G*******LNX01 kernel: Movable zone start PFN for each node
Aug  1 08:51:04 G*******LNX01 kernel: early_node_map[4] active PFN ranges
Aug  1 08:51:04 G*******LNX01 kernel:     0: 0x00000010 -> 0x0000009f
Aug  1 08:51:04 G*******LNX01 kernel:     0: 0x00000100 -> 0x000bfee0
Aug  1 08:51:04 G*******LNX01 kernel:     0: 0x000bff00 -> 0x000c0000
Aug  1 08:51:04 G*******LNX01 kernel:     0: 0x00100000 -> 0x00240000
Aug  1 08:51:04 G*******LNX01 kernel: ACPI: PM-Timer IO Port: 0x1008
Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled)
Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] enabled)
Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC (acpi_id[0x05] lapic_id[0x05] enabled)
Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] enabled)
Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] enabled)
Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1])
Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC_NMI (acpi_id[0x04] high edge lint[0x1])
Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC_NMI (acpi_id[0x05] high edge lint[0x1])
Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC_NMI (acpi_id[0x06] high edge lint[0x1])
Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC_NMI (acpi_id[0x07] high edge lint[0x1])
Aug  1 08:51:04 G*******LNX01 kernel: ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0])
Aug  1 08:51:04 G*******LNX01 kernel: IOAPIC[0]: apic_id 8, version 17, address 0xfec00000, GSI 0-23
Aug  1 08:51:04 G*******LNX01 kernel: ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge)
Aug  1 08:51:04 G*******LNX01 kernel: Using ACPI (MADT) for SMP configuration information
Aug  1 08:51:04 G*******LNX01 kernel: ACPI: HPET id: 0x8086af01 base: 0xfed00000
Aug  1 08:51:04 G*******LNX01 kernel: SMP: Allowing 8 CPUs, 0 hotplug CPUs
Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 000000000009f000 - 00000000000a0000
Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 00000000000a0000 - 00000000000ca000
Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 00000000000ca000 - 00000000000cc000
Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 00000000000cc000 - 00000000000dc000
Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 00000000000dc000 - 0000000000100000
Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 00000000bfee0000 - 00000000bfeff000
Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 00000000bfeff000 - 00000000bff00000
Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 00000000c0000000 - 00000000e0000000
Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 00000000e0000000 - 00000000f0000000
Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 00000000f0000000 - 00000000fec00000
Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 00000000fec00000 - 00000000fec10000
Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 00000000fec10000 - 00000000fee00000
Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 00000000fee00000 - 00000000fee01000
Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 00000000fee01000 - 00000000fffe0000
Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 00000000fffe0000 - 0000000100000000
Aug  1 08:51:04 G*******LNX01 kernel: Allocating PCI resources starting at c0000000 (gap: c0000000:20000000)
Aug  1 08:51:04 G*******LNX01 kernel: Booting paravirtualized kernel on bare hardware
Aug  1 08:51:04 G*******LNX01 kernel: NR_CPUS:256 nr_cpumask_bits:256 nr_cpu_ids:8 nr_node_ids:1
Aug  1 08:51:04 G*******LNX01 kernel: PERCPU: Embedded 29 pages/cpu @ffff880028200000 s88280 r8192 d22312 u262144
Aug  1 08:51:04 G*******LNX01 kernel: pcpu-alloc: s88280 r8192 d22312 u262144 alloc=1*2097152
Aug  1 08:51:04 G*******LNX01 kernel: pcpu-alloc: [0] 0 1 2 3 4 5 6 7 
Aug  1 08:51:04 G*******LNX01 kernel: Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 2064641
Aug  1 08:51:04 G*******LNX01 kernel: Policy zone: Normal
Aug  1 08:51:04 G*******LNX01 kernel: Kernel command line: ro root=/dev/VolGroup00/LogVol00 rhgb quiet
Aug  1 08:51:04 G*******LNX01 kernel: PID hash table entries: 4096 (order: 3, 32768 bytes)
Aug  1 08:51:04 G*******LNX01 kernel: Initializing CPU#0
Aug  1 08:51:04 G*******LNX01 kernel: xsave/xrstor: enabled xstate_bv 0x7, cntxt size 0x340
Aug  1 08:51:04 G*******LNX01 kernel: Checking aperture...
Aug  1 08:51:04 G*******LNX01 kernel: No AGP bridge found
Aug  1 08:51:04 G*******LNX01 kernel: PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
Aug  1 08:51:04 G*******LNX01 kernel: Placing 64MB software IO TLB between ffff880020000000 - ffff880024000000
Aug  1 08:51:04 G*******LNX01 kernel: software IO TLB at phys 0x20000000 - 0x24000000
Aug  1 08:51:04 G*******LNX01 kernel: Memory: 8183576k/9437184k available (4454k kernel code, 1049156k absent, 204452k reserved, 7191k data, 1720k init)
Aug  1 08:51:04 G*******LNX01 kernel: Hierarchical RCU implementation.
Aug  1 08:51:04 G*******LNX01 kernel: NR_IRQS:4352 nr_irqs:472
Aug  1 08:51:04 G*******LNX01 kernel: Extended CMOS year: 2000
Aug  1 08:51:04 G*******LNX01 kernel: Console: colour VGA+ 80x25
Aug  1 08:51:04 G*******LNX01 kernel: console [tty0] enabled
Aug  1 08:51:04 G*******LNX01 kernel: allocated 83886080 bytes of page_cgroup
Aug  1 08:51:04 G*******LNX01 kernel: please try 'cgroup_disable=memory' option if you don't want memory cgroups
Aug  1 08:51:04 G*******LNX01 kernel: TSC freq read from hypervisor : 2900.001 MHz
Aug  1 08:51:04 G*******LNX01 kernel: Detected 2900.001 MHz processor.
Aug  1 08:51:04 G*******LNX01 kernel: Calibrating delay loop (skipped) preset value.. 5800.00 BogoMIPS (lpj=2900001)
Aug  1 08:51:04 G*******LNX01 kernel: Security Framework initialized
Aug  1 08:51:04 G*******LNX01 kernel: SELinux:  Initializing.
Aug  1 08:51:04 G*******LNX01 kernel: Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)
Aug  1 08:51:04 G*******LNX01 kernel: Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)
Aug  1 08:51:04 G*******LNX01 kernel: Mount-cache hash table entries: 256
Aug  1 08:51:04 G*******LNX01 kernel: Initializing cgroup subsys ns
Aug  1 08:51:04 G*******LNX01 kernel: Initializing cgroup subsys cpuacct
Aug  1 08:51:04 G*******LNX01 kernel: Initializing cgroup subsys memory
Aug  1 08:51:04 G*******LNX01 kernel: Initializing cgroup subsys devices
Aug  1 08:51:04 G*******LNX01 kernel: Initializing cgroup subsys freezer
Aug  1 08:51:04 G*******LNX01 kernel: Initializing cgroup subsys net_cls
Aug  1 08:51:04 G*******LNX01 kernel: CPU: Physical Processor ID: 0
Aug  1 08:51:04 G*******LNX01 kernel: CPU: Processor Core ID: 0
Aug  1 08:51:04 G*******LNX01 kernel: CPU: L1 I cache: 32K, L1 D cache: 32K
Aug  1 08:51:04 G*******LNX01 kernel: CPU: L2 cache: 256K
Aug  1 08:51:04 G*******LNX01 kernel: CPU: L3 cache: 20480K
Aug  1 08:51:04 G*******LNX01 kernel: CPU 0/0x0 -> Node 0
Aug  1 08:51:04 G*******LNX01 kernel: mce: CPU supports 0 MCE banks
Aug  1 08:51:04 G*******LNX01 kernel: Performance Events: Nehalem/Corei7 events, Intel PMU driver.
Aug  1 08:51:04 G*******LNX01 kernel: ... version:                3
Aug  1 08:51:04 G*******LNX01 kernel: ... bit width:              48
Aug  1 08:51:04 G*******LNX01 kernel: ... generic registers:      4
Aug  1 08:51:04 G*******LNX01 kernel: ... value mask:             0000ffffffffffff
Aug  1 08:51:04 G*******LNX01 kernel: ... max period:             000000007fffffff
Aug  1 08:51:04 G*******LNX01 kernel: ... fixed-purpose events:   3
Aug  1 08:51:04 G*******LNX01 kernel: ... event mask:             000000070000000f
Aug  1 08:51:04 G*******LNX01 kernel: ACPI: Core revision 20090903
Aug  1 08:51:04 G*******LNX01 kernel: ftrace: converting mcount calls to 0f 1f 44 00 00
Aug  1 08:51:04 G*******LNX01 kernel: ftrace: allocating 26665 entries in 105 pages
Aug  1 08:51:04 G*******LNX01 kernel: Setting APIC routing to flat
Aug  1 08:51:04 G*******LNX01 kernel: ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
Aug  1 08:51:04 G*******LNX01 kernel: CPU0: Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz stepping 07
Aug  1 08:51:04 G*******LNX01 kernel: Booting processor 1 APIC 0x1 ip 0x6000
Aug  1 08:51:04 G*******LNX01 kernel: Initializing CPU#1
Aug  1 08:51:04 G*******LNX01 kernel: CPU: Physical Processor ID: 0
Aug  1 08:51:04 G*******LNX01 kernel: CPU: Processor Core ID: 1
Aug  1 08:51:04 G*******LNX01 kernel: CPU: L1 I cache: 32K, L1 D cache: 32K
Aug  1 08:51:04 G*******LNX01 kernel: CPU: L2 cache: 256K
Aug  1 08:51:04 G*******LNX01 kernel: CPU: L3 cache: 20480K
Aug  1 08:51:04 G*******LNX01 kernel: CPU 1/0x1 -> Node 0
Aug  1 08:51:04 G*******LNX01 kernel: mce: CPU supports 0 MCE banks
Aug  1 08:51:04 G*******LNX01 kernel: CPU1: Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz stepping 07
Aug  1 08:51:04 G*******LNX01 kernel: Skipping synchronization checks as TSC is reliable.
Aug  1 08:51:04 G*******LNX01 kernel: Booting processor 2 APIC 0x2 ip 0x6000
Aug  1 08:51:04 G*******LNX01 kernel: Initializing CPU#2
Aug  1 08:51:04 G*******LNX01 kernel: CPU: Physical Processor ID: 1
Aug  1 08:51:04 G*******LNX01 kernel: CPU: Processor Core ID: 0
Aug  1 08:51:04 G*******LNX01 kernel: CPU: L1 I cache: 32K, L1 D cache: 32K
Aug  1 08:51:04 G*******LNX01 kernel: CPU: L2 cache: 256K
Aug  1 08:51:04 G*******LNX01 kernel: CPU: L3 cache: 20480K
Aug  1 08:51:04 G*******LNX01 kernel: CPU 2/0x2 -> Node 0
Aug  1 08:51:04 G*******LNX01 kernel: mce: CPU supports 0 MCE banks
Aug  1 08:51:04 G*******LNX01 kernel: CPU2: Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz stepping 07
Aug  1 08:51:04 G*******LNX01 kernel: Booting processor 3 APIC 0x3 ip 0x6000
Aug  1 08:51:04 G*******LNX01 kernel: Initializing CPU#3
Aug  1 08:51:04 G*******LNX01 kernel: CPU: Physical Processor ID: 1
Aug  1 08:51:04 G*******LNX01 kernel: CPU: Processor Core ID: 1
Aug  1 08:51:04 G*******LNX01 kernel: CPU: L1 I cache: 32K, L1 D cache: 32K
Aug  1 08:51:04 G*******LNX01 kernel: CPU: L2 cache: 256K
Aug  1 08:51:04 G*******LNX01 kernel: CPU: L3 cache: 20480K
Aug  1 08:51:04 G*******LNX01 kernel: CPU 3/0x3 -> Node 0
Aug  1 08:51:04 G*******LNX01 kernel: mce: CPU supports 0 MCE banks
Aug  1 08:51:04 G*******LNX01 kernel: CPU3: Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz stepping 07
Aug  1 08:51:04 G*******LNX01 kernel: Booting processor 4 APIC 0x4 ip 0x6000
Aug  1 08:51:04 G*******LNX01 kernel: Initializing CPU#4
Aug  1 08:51:04 G*******LNX01 kernel: CPU: Physical Processor ID: 2
Aug  1 08:51:04 G*******LNX01 kernel: CPU: Processor Core ID: 0
Aug  1 08:51:04 G*******LNX01 kernel: CPU: L1 I cache: 32K, L1 D cache: 32K
Aug  1 08:51:04 G*******LNX01 kernel: CPU: L2 cache: 256K
Aug  1 08:51:04 G*******LNX01 kernel: CPU: L3 cache: 20480K
Aug  1 08:51:04 G*******LNX01 kernel: CPU 4/0x4 -> Node 0
Aug  1 08:51:04 G*******LNX01 kernel: mce: CPU supports 0 MCE banks
Aug  1 08:51:04 G*******LNX01 kernel: CPU4: Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz stepping 07
--More--(17%)

搜索了很多資料,等我還在茫茫大海中苦苦查證時,又連接不上該服務器,該服務器又宕機了。由於沒有權限,Lync共享桌面又奇慢無比。求助於我們這邊的系統管理員(擁有那邊的虛擬機服務器權限),他用vSphere Client連接上去檢查時,終於發現一個讓人既鬱悶又震驚的原因。

這臺物理機DELL R720只有32G內存,上面有一個Linux系統、幾個Windows系統。但是有於那邊管理員添加了測試服務器,所有系統總共分配的內存加起來43G已經超過了原來物理機32G內存,導致系統間資源爭用。出現內存資源不足的情況。最後Linux直接宕機的情況。

出現這種情況,一來是由於管理員疏忽,沒有注意到實際內存資源分配情況。二來我不知情,信息不足(由於權限問題,我並不瞭解那邊物理機與虛擬機的情況),一直運行的好好的系統,突然出現這個問題,導致我侷限在數據庫、應用程序、操作系統層面去查找原因。而沒有縱覽全局,從架構、資源層面去查找問題。導致一直沒有查找到根本原因之所在。

解決方案

關閉測試服務器,釋放出足夠的內存資源。問題解決。然後系統從8月1號運行到現在再也沒有出現過這個問題。

參考資料:

>
http://dbanotes.net/database/linux_outofmemory_oom_killer.html
>
http://www.huomo.cn/os/article-16bb4.html

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章