内核崩溃

内核崩溃

1、问题描述

  当屏幕接在机器上时上电启动,主动安装、卸载驱动都是一切正常。但是当上电时没有接上屏幕,然后主动去卸载驱动会导致内核崩溃。

2、log截选

[  135.779814] Unable to handle kernel paging request at virtual address ffff000000dd36a8
[  135.780842] Mem abort info:
[  135.781203]   Exception class = IABT (current EL), IL = 32 bits
[  135.781954]   SET = 0, FnV = 0
[  135.782348]   EA = 0, S1PTW = 0
[  135.782750] ====>AEE dump_stack start
[  135.782766] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G         C      4.14.98-07190-gbcdaf61-dirty #5
[  135.784373] Hardware name: Freescale i.MX8QXP MEK (DT)
[  135.785030] Call trace:
[  135.785361] [<ffff00000808b198>] dump_backtrace+0x0/0x414
[  135.786059] [<ffff00000808b5c0>] show_stack+0x14/0x1c
[  135.786710] [<ffff000008adc008>] dump_stack+0x90/0xb0
[  135.787363] [<ffff00000809d1f4>] __do_kernel_fault+0x98/0x114
[  135.788098] [<ffff00000809d638>] do_translation_fault+0x48/0xa0
[  135.788857] [<ffff0000080812d0>] do_mem_abort+0x4c/0xcc
[  135.789526] Exception stack(0xffff00000800bc50 to 0xffff00000800bd90)
[  135.790350] bc40:                                   ffff800879194000 ffff000000dd36a8
[  135.791349] bc60: ffff800879194000 ffff800878108c80 0000000000000080 ffff800879194978
[  135.792346] bc80: 0010000000000000 4010000100000000 ffff80087ff33da8 0000000000000004
[  135.793343] bca0: 4000000100000000 ffff00000800be78 0000000000000001 0000000000007d00
[  135.794340] bcc0: 0000000000000001 0000000000000000 ffff0000081299dc 0000000000000000
[  135.795338] bce0: 0000000000000000 ffff800879194978 ffff800879194978 ffff000000dd36a8
[  135.796335] bd00: 0000000000000101 ffff800879194000 ffff000009189f38 ffff800878108c80
[  135.797333] bd20: ffff000008f98908 ffff000009189f38 ffff800878108c80 ffff00000800bd90
[  135.798330] bd40: ffff000008126b00 ffff00000800bd90 ffff000000dd36a8 0000000080000145
[  135.799328] bd60: ffff000009189f38 ffff000008f97018 0000ffffffffffff ffff000008f97018
[  135.800323] bd80: ffff00000800bd90 ffff000000dd36a8
[  135.800949] [<ffff000008083054>] el1_da+0x24/0x84
[  135.801556] [<ffff000000dd36a8>] 0xffff000000dd36a8
[  135.802184] [<ffff000008126d74>] expire_timers+0xdc/0x164
[  135.802874] [<ffff000008126e94>] run_timer_softirq+0x98/0x180
[  135.803609] [<ffff000008081a30>] __do_softirq+0x130/0x364
[  135.804304] [<ffff0000080b485c>] irq_exit+0xbc/0xec
[  135.804934] [<ffff00000810e268>] __handle_domain_irq+0x64/0xac
[  135.805678] [<ffff000008081858>] gic_handle_irq+0xd4/0x17c
[  135.806377] Exception stack(0xffff00000938bdd0 to 0xffff00000938bf10)
[  135.807199] bdc0:                                   ffff000008f97018 0000800876f99000
[  135.808197] bde0: 0000800876f99000 ffff00000938bf20 0000800876f99000 0038815500000000
[  135.809194] be00: 0000000043738680 ffff0000272f3850 ffff8008781095a0 ffff00000938be90
[  135.810192] be20: 00000000000008c0 00000000b3e6a559 0000000000000078 00000000f4b01930
[  135.811187] be40: 00000000b3e6a391 0000000000000000 ffff0000081299dc 0000000000000000
[  135.812184] be60: 0000000000000000 ffff000008f97000 ffff000009189000 0000000000000001
[  135.813182] be80: ffff000008fa1850 ffff000009189fdc ffff000009189000 0000000000000000
[  135.814179] bea0: 0000000000000000 0000000000000000 0000000000000000 ffff00000938bf10
[  135.815176] bec0: ffff000008085894 ffff00000938bf10 ffff000008085898 0000000060000145
[  135.816171] bee0: ffff80087ff36700 0000001f9ca44f00 ffffffffffffffff 0000000000000001
[  135.817166] bf00: ffff00000938bf10 ffff000008085898
[  135.817792] [<ffff000008083230>] el1_irq+0xb0/0x124
[  135.818421] [<ffff000008085898>] arch_cpu_idle+0x2c/0x1c0
[  135.819113] [<ffff000008af8e14>] default_idle_call+0x18/0x2c
[  135.819841] [<ffff0000080fa14c>] do_idle+0x1ac/0x26c
[  135.820477] [<ffff0000080fa3c4>] cpu_startup_entry+0x20/0x24
[  135.821203] [<ffff000008091900>] secondary_start_kernel+0x104/0x110
[8 79194040 ffff8008 79194050 ffff8008 79194050 ffff8008
[  135.853754] 4060  79194060 ffff8008 79194060 ffff8008 79194070 ffff8008 79194070 ffff8008
[  135.854817]
[  135.854817] X2: 0xffff800879193f80:
[  135.855447] 3f80  2159b880 ffff7e00 00001000 00000000 2159b8c0 ffff7e00 00001000 00000000
[  135.856505] 3fa0  2159b900 ffff7e00 00001000 00000000 2159b940 ffff7e00 00001000 00000000
[  135.857564] 3fc0  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  135.858623] 3fe0  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  135.85[  135.888074] 3dc8  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  135.889132] 3de8  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  135.890189] 3e08  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  135.891257]
[  135.891257] X19: 0xffff8008791948f8:
[  135.891898] 48f8  00000000 00000000 00000000 00000000 ffffffe0 0000000f 79194910 ffff8008
[  135.892958] 4918  79194910 ffff8008 00dd4168 ffff0000 00000040 00000000 79194930 ffff8008
[  135.894018] 4938  79194930 ffff8008 00dd8f08 ffff0000 00000000 00000000 00000000 00000000
[  135.895077] 4950000000 79194930 ffff8008
[  135.903135] 4938  79194930 ffff8008 00dd8f08 ffff0000 00000000 00000000 00000000 00000000
[  135.904194] 4958  00000000 00000000 00dd36f0 ffff0000 79194000 ffff8008 00000001 00000000
[  135.905253] 4978  00000200 dead0000 00000000 00000000 ffff5f9c 00000000 00dd36a8 ffff0000
[  135.906310] 4998  79194000 ffff8008 1d000001 00000000 00000000 00000000 00000000 00000000
[  135.907369] 49b8  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  135.908427] 49d8  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  135.909492]
[  135.:
[  135.938213] Exception stack(0xffff00000800bc50 to 0xffff00000800bd90)
[  135.939034] bc40:                                   ffff800879194000 ffff000000dd36a8
[  135.940032] bc60: ffff800879194000 ffff800878108c80 0000000000000080 ffff800879194978
[  135.941031] bc80: 0010000000000000 4010000100000000 ffff80087ff33da8 0000000000000004
[  135.942028] bca0: 4000000100000000 ffff00000800be78 0000000000000001 0000000000007d00
[  135.943025] bcc0: 0000000000000001 0000000000000000 ffff0000081299dc 0000000000000000
[  135.944023] bce0: 0000000000000000 ffff800879194978 ffff80080x130/0x364
[  135.952372] [<ffff0000080b485c>] irq_exit+0xbc/0xec
[  135.952999] [<ffff00000810e268>] __handle_domain_irq+0x64/0xac
[  135.953745] [<ffff000008081858>] gic_handle_irq+0xd4/0x17c
[  135.954442] Exception stack(0xffff00000938bdd0 to 0xffff00000938bf10)
[  135.955264] bdc0:                                   ffff000008f97018 0000800876f99000
[  135.956261] bde0: 0000800876f99000 ffff00000938bf20 0000800876f99000 0038815500000000
[  135.957259] be00: 0000000043738680 ffff0000272f3850 ffff8008781095a0 ffff00000938be90
[  135.958257] be20: 00000000000008c0 00000000b3e6a55

3、原因分析

[  135.779814] Unable to handle kernel paging request at virtual address ffff000000dd36a8

  无法在虚拟地址处理内核分页请求,大致原因有以下三点。

1、Unable to handle kernel paging request at virtual address 00000000 原因是由于使用空NULL指针。
2、Unable to handle kernel paging request at virtual address 20100110 原因是的内存越界导致该指针, 所在内存被破坏了。 接下来的困难是在什么地方这个内存被修改?为什么被修改?
3、Unable to handle kernel paging request at virtual address c074838c 试图篡改受限制内存。比如:声明为const的变量!

[  135.800949] [<ffff000008083054>] el1_da+0x24/0x84
[  135.801556] [<ffff000000dd36a8>] 0xffff000000dd36a8
[  135.802184] [<ffff000008126d74>] expire_timers+0xdc/0x164(终止定时器)
[  135.802874] [<ffff000008126e94>] run_timer_softirq+0x98/0x180
[  135.803609] [<ffff000008081a30>] __do_softirq+0x130/0x364
[  135.804304] [<ffff0000080b485c>] irq_exit+0xbc/0xec
[  135.804934] [<ffff00000810e268>] __handle_domain_irq+0x64/0xac
[  135.805678] [<ffff000008081858>] gic_handle_irq+0xd4/0x17c

  说明可能是哪个定时器忘记关闭了

4、代码查找

int cyttsp6_release(struct cyttsp6_core_data *cd)
{
	struct device *dev = cd->dev;
	cyttsp6_proximity_release(dev);
	cyttsp6_btn_release(dev);
	cyttsp6_mt_release(dev);
	
#ifdef CONFIG_HAS_EARLYSUSPEND
	unregister_early_suspend(&cd->es);
#elif defined(CONFIG_FB)
	fb_unregister_client(&cd->fb_notifier);
#endif
	
#if NEED_SUSPEND_NOTIFIER
	unregister_pm_notifier(&cd->pm_notifier);
#endif

	/*
	 * Suspend the device before freeing the startup_work and stopping
	 * the watchdog since sleep function restarts watchdog on failure
	 */
	pm_runtime_suspend(dev);
	pm_runtime_disable(dev);
	cancel_work_sync(&cd->startup_work);
	cyttsp6_stop_wd_timer(cd);
	del_timer(&cd->cyttsp6_recovery_timer);//忘记删除这个定时器了
	device_init_wakeup(dev, 0);
	remove_sysfs_interfaces(cd, dev);
	free_irq(cd->irq, cd);
	if (cd->cpdata->init)
		cd->cpdata->init(cd->cpdata, 0, dev);
	dev_set_drvdata(dev, NULL);
	cyttsp6_del_core(dev);
	cyttsp6_free_si_ptrs(cd);
	kfree(cd);

	return 0;
}

5、问题总结

  在释放函数中缺少删除定时器的操作,理论上是在任何时候卸载驱动都会导致内核崩溃,但是在接上屏幕后上电并不会导致内核崩溃。分析代码得到的结果是,当安装驱动时,屏幕初始化完成后会删除该定时器,也就导致了接着屏幕时安装、卸载驱动都不会有任何问题。但是当屏幕没有接在上面时候就会导致无法初始化成功,所有定时器就无法被正常删除,最终卸载驱动时也没有卸载定时器资源导致内核崩溃。

6、过程总结

1、内核崩溃一般为非法地址的访问、资源没有完全释放。
2、查找过程重点在本驱动,不要纠结到内核源码中去。
3、从释放资源函数开始,重点留意驱动中的资源是否完全释放。
4、认真查看log中的代码足迹,可以从中获取问题的关键。

  

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章