Hardware tsc bug and core dump analyzation

原創

2018-08-27 07:30

Recently one of the exadata cell node experienced reboot multiple times without any load. This is blocking us releasing the env to customer.

Kernel is 4.1.12-61.47.1.el6uek.x86_64

I analyzed all 6 vmcores and found some common feature, expired timer jiffies are quite small than current jiffies, like below:

TVEC_BASES[2]: ffff881ffda8eb40
JIFFIES
15605264118
EXPIRES TIMER_LIST FUNCTION
4320897356 ffff88003786a408 ffffffff8157d810 <intel_pstate_timer_func>
4320898348 ffff8810af00fc48 ffffffff810ed490 <process_timeout>
...
4320898348 ffff8810c7e63c48 ffffffff810ed490 <process_timeout>
4321197994 ffff881ffda8cee0 ffffffff81045770 <mce_timer_fn>

At first, I can't think of how jiffies jump like this, even with nohz enabled, due to the limit of max idle time, jiffies delta is quite small.

Luckily, I found below log on 2 vmcores, it showed TSC has large jump from its last value and lead to current jiffies jumped large. This print isn't in the other 4 of vmcores, because watchdog timer of clocksource cycle through all cpus(40 cpus in this env), it may not always have opportunity to check the buggy cpu yet before the softlockup.

[11277703.341052] timekeeping watchdog: Marking clocksource 'tsc' as unstable, because the skew is too large:
[11277703.341054] 'hpet' wd_now: 63896768 wd_last: 636c76a3 mask: ffffffff
[11277703.341056] 'tsc' cs_now: 5a341a9f1c6ea0 cs_last: 2341a8dc79b74 mask: ffffffffffffffff

In __run_timers, there is a loop to process all the expired timers until current jiffies, with a delta of 0x5800001154D32C jiffies the loop time centainly exceed softlockup's timeout. So we see below output:

[ 5062.020370] NMI watchdog: BUG: soft lockup - CPU#22 stuck for 205s! [swapper/22:0]
[ 5062.028896] task: ffff881ff17bd400 ti: ffff881ff17e8000 task.ti: ffff881ff17e8000
[ 5062.028897] RIP: 0010:[<ffffffff810ed290>] [<ffffffff810ed290>] run_timer_softirq+0x1f0/0x380
[ 5062.028923] Call Trace:
[ 5062.028924] <IRQ>
[ 5062.028930] [<ffffffff8157d810>] ? pid_param_set+0x120/0x120
[ 5062.028935] [<ffffffff8108871a>] __do_softirq+0x10a/0x350
[ 5062.028938] [<ffffffff81088ad5>] irq_exit+0x125/0x130
[ 5062.028941] [<ffffffff8105b9d4>] ? native_apic_msr_eoi_write+0x14/0x20
[ 5062.028946] [<ffffffff816a0af6>] smp_apic_timer_interrupt+0x46/0x60
[ 5062.028948] [<ffffffff8169ea3e>] apic_timer_interrupt+0x6e/0x80
[ 5062.028949] <EOI>
[ 5062.028952] [<ffffffff8157e695>] ? cpuidle_enter_state+0xc5/0x230
[ 5062.028954] [<ffffffff8157e665>] ? cpuidle_enter_state+0x95/0x230
[ 5062.028956] [<ffffffff8157e817>] cpuidle_enter+0x17/0x20
[ 5062.028959] [<ffffffff810c8eb4>] cpuidle_idle_call+0xe4/0x1e0
[ 5062.028961] [<ffffffff810c91a5>] cpu_idle_loop+0x1f5/0x220
[ 5062.028962] [<ffffffff810c91eb>] ? cpu_startup_entry+0x1b/0x70
[ 5062.028964] [<ffffffff810c922f>] cpu_startup_entry+0x5f/0x70
[ 5062.028966] [<ffffffff81052dab>] start_secondary+0xbb/0xc0

Finally issue disapper after they replace the buggy processor. This issue should not exist with 4.12 or above as there is a NOHZ optimized implementation for expired timer to survive and drop down TSC source. I also suggest they add 'notsc lapic=notscdeadline' to see if workaround the issue, unluckily they will not try it.

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Hardware tsc bug and core dump analyzation

《日本蠟燭圖》讀書筆記 & 技術分析回測

一分鐘部署 Llama3 中文大模型，沒別的，就是快

Python多線程編程深度探索：從入門到實戰

《期貨-市場技術分析》讀書筆記

mongodb處理json數據很好

ffmpeg 百度雲盤

頂級 Javaer 都在用的 20 個類庫，真香！

[轉帖]cpupower

google瀏覽器插件開發

35K*14 薪，入職了！這公司只要不裁員，我能一直呆下去！

eventfd閱讀筆記

poll內核代碼閱讀筆記

Known attacks on x86 u-state

main_loop_wait vs aio_poll

pcid support in upstream kernel

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結