卸载模块后,继续访问模块的数据(全局变量、函数等)导致非法内存访问,触发panic,是一种常见的panic场景。本文分析了一种此类panic的解决思路。
pannic日志如下:
Unable to handle kernel paging request at virtual address 7fdbc8a8
pgd = 825dc000
[7fdbc8a8] *pgd=47fb2811, *pte=00000000, *ppte=00000000
Internal error: Oops: 7 [#1] PREEMPT SMP ARM
Modules linked in: ecm ath_pktlog(P) wifi_3_0 qca_ol umac asf(P) qdf mem_manager(P) 。。。 [last unloaded: passthrough]
CPU: 1 PID: 2325 Comm: uci Tainted: P 4.4.60 #0
Hardware name: Generic DT based system
task: 83366d00 ti: 84ef4000 task.ti: 84ef4000
PC is at netif_skb_features+0xcc/0x214
LR is at validate_xmit_skb.part.23+0xc/0x278
pc : [<814dacec>] lr : [<814dae40>] psr: 80000113
sp : 84ef5780 ip : 866fa740 fp : 866fa000
r10: 00000000 r9 : 831a5c68 r8 : 00000000
r7 : 00000000 r6 : 825679c0 r5 : 00000000 r4 : 00004000
r3 : 7fdbc7c8 r2 : 0000a888 r1 : 84c8c000 r0 : 00000000
Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
Control: 10c0383d Table: 425dc06a DAC: 00000055
Process uci (pid: 2325, stack limit = 0x84ef4210)
Stack: (0x84ef5780 to 0x84ef6000)
5780: 825679c0 84c8c000 00000301 825679c0 86b6d600 00000000 00000000 00000000
57a0: 831a5c68 814dae40 825679c0 84c8c000 00000301 00000000 86b6d600 00000000
57c0: 00000000 00000000 831a5c68 814db0d0 831a5c00 86b6d600 825679c0 84c8c000
。。。
[<814dacec>] (netif_skb_features) from [<814dae40>] (validate_xmit_skb.part.23+0xc/0x278)
[<814dae40>] (validate_xmit_skb.part.23) from [<814db0d0>] (validate_xmit_skb_list+0x24/0x4c)
[<814db0d0>] (validate_xmit_skb_list) from [<814f5000>] (sch_direct_xmit+0x3c/0x1d8)
[<814f5000>] (sch_direct_xmit) from [<814db704>] (__dev_queue_xmit+0x238/0x484)
[<814db704>] (__dev_queue_xmit) from [<815ade38>] (br_dev_queue_push_xmit+0x128/0x140)
[<815ade38>] (br_dev_queue_push_xmit) from [<815ba8e8>] (br_nf_post_routing+0x204/0x284)
[<815ba8e8>] (br_nf_post_routing) from [<81500c60>] (nf_iterate+0x54/0x78)
[<81500c60>] (nf_iterate) from [<81500cb4>] (nf_hook_slow+0x30/0xb4)
[<81500cb4>] (nf_hook_slow) from [<815adedc>] (br_forward_finish+0x8c/0xa0)
[<815adedc>] (br_forward_finish) from [<815baa80>] (br_nf_forward_finish+0x118/0x178)
[<815baa80>] (br_nf_forward_finish) from [<815bafac>] (br_nf_forward_ip+0x2f4/0x380)
[<815bafac>] (br_nf_forward_ip) from [<81500c60>] (nf_iterate+0x54/0x78)
[<81500c60>] (nf_iterate) from [<81500cb4>] (nf_hook_slow+0x30/0xb4)
[<81500cb4>] (nf_hook_slow) from [<815ae134>] (__br_forward+0xe4/0x100)
[<815ae134>] (__br_forward) from [<815ae044>] (deliver_clone+0x40/0x4c)
[<815ae044>] (deliver_clone) from [<815ae268>] (maybe_deliver+0x78/0x88)
[<815ae268>] (maybe_deliver) from [<815ae2fc>] (br_flood+0x84/0xdc)
[<815ae2fc>] (br_flood) from [<815ae5ac>] (br_flood_forward+0x14/0x20)
[<815ae5ac>] (br_flood_forward) from [<815af8ac>] (br_handle_frame_finish+0x57c/0x5b0)
[<815af8ac>] (br_handle_frame_finish) from [<815bb1f0>] (br_nf_pre_routing_finish+0x1b8/0x348)
[<815bb1f0>] (br_nf_pre_routing_finish) from [<815bb708>] (br_nf_pre_routing+0x2dc/0x354)
[<815bb708>] (br_nf_pre_routing) from [<81500c60>] (nf_iterate+0x54/0x78)
[<81500c60>] (nf_iterate) from [<81500cb4>] (nf_hook_slow+0x30/0xb4)
[<81500cb4>] (nf_hook_slow) from [<815afc58>] (br_handle_frame+0x378/0x3c0)
[<815afc58>] (br_handle_frame) from [<814d756c>] (__netif_receive_skb_core+0x414/0x7dc)
[<814d756c>] (__netif_receive_skb_core) from [<814d8e30>] (netif_receive_skb_internal+0x60/0xac)
[<814d8e30>] (netif_receive_skb_internal) from [<814d956c>] (napi_gro_receive+0x48/0xc4)
[<814d956c>] (napi_gro_receive) from [<7f20a768>] (nss_core_send_buffer+0x2528/0x27a8 [qca_nss_drv])
[<7f20a768>] (nss_core_send_buffer [qca_nss_drv]) from [<7f20aa04>] (nss_core_handle_napi_queue+0x1c/0x44 [qca_nss_drv])
[<7f20aa04>] (nss_core_handle_napi_queue [qca_nss_drv]) from [<814d9bf4>] (net_rx_action+0xe0/0x28c)
[<814d9bf4>] (net_rx_action) from [<812280c0>] (__do_softirq+0xd0/0x200)
[<812280c0>] (__do_softirq) from [<81228464>] (irq_exit+0x84/0xf4)
[<81228464>] (irq_exit) from [<8125ccdc>] (__handle_domain_irq+0x90/0xb4)
[<8125ccdc>] (__handle_domain_irq) from [<81209378>] (gic_handle_irq+0x50/0x94)
[<81209378>] (gic_handle_irq) from [<8120a480>] (__irq_svc+0x40/0x74)
Exception stack(0x84ef5da8 to 0x84ef5df0)
5da0: 8f82a000 00000000 0000003f 8fdf1740 8fdf1734 8f82a000
5dc0: 8fdf1734 76fbd000 84ec46f4 83b7f600 827484d0 76fb0000 8648f880 84ef5df8
5de0: 812c2638 812b6f80 20000113 ffffffff
[<8120a480>] (__irq_svc) from [<812b6f80>] (page_add_file_rmap+0x74/0x80)
[<812b6f80>] (page_add_file_rmap) from [<812afd48>] (do_set_pte+0x98/0xf0)
[<812afd48>] (do_set_pte) from [<81290ef0>] (filemap_map_pages+0x1d8/0x258)
[<81290ef0>] (filemap_map_pages) from [<812b02f8>] (handle_mm_fault+0x558/0xfbc)
[<812b02f8>] (handle_mm_fault) from [<8121ffcc>] (do_page_fault+0x12c/0x288)
[<8121ffcc>] (do_page_fault) from [<812092c4>] (do_PrefetchAbort+0x34/0x98)
[<812092c4>] (do_PrefetchAbort) from [<8120a95c>] (ret_from_exception+0x0/0x24)
Exception stack(0x84ef5fb0 to 0x84ef5ff8)
5fa0: 76f52000 7ee039c0 76fb15e8 76f52000
5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
5fe0: 76f52000 7ee039c0 76faf140 76fb15e8 60000010 ffffffff
Code: e3a03000 e1923003 13c44012 e591312c (e59370e0)
---[ end trace aed86ced24643538 ]---
Trying to store crash panic, reason = 2 vs 1
Kernel panic - not syncing: Fatal exception in interrupt
CPU0: stopping
CPU: 0 PID: 1599 Comm: syslog-ng Tainted: P D 4.4.60 #0
Hardware name: Generic DT based system
[<8121e540>] (unwind_backtrace) from [<8121b6a0>] (show_stack+0x10/0x14)
[<8121b6a0>] (show_stack) from [<81395638>] (dump_stack+0x7c/0x9c)
[<81395638>] (dump_stack) from [<8121d95c>] (handle_IPI+0xe8/0x180)
[<8121d95c>] (handle_IPI) from [<812093a0>] (gic_handle_irq+0x78/0x94)
[<812093a0>] (gic_handle_irq) from [<8120a744>] (__irq_usr+0x44/0x60)
Exception stack(0x832bbfb0 to 0x832bbff8)
bfa0: 00000000 555ea7b4 00000000 555ea6e0
bfc0: 555ea6b0 555ea6ec 76e84508 7ea8b970 00000001 00000000 7ea8b9b4 76e58070
bfe0: 00000018 7ea8b948 76e84448 76e84290 60000010 ffffffff
bt_driver 1943008.bt: IPQ crashed, inform BT to prepare dump
bt_driver 1943008.bt: BT IPC not initialized, no message sent
Trying to store crash syslog !
Trying to store crash panic, reason = 1 vs 1
get crash_buf len 15745 vs 524260, 524288
Rebooting in 3 seconds..
可以看到是访问了非法内存:
Unable to handle kernel paging request at virtual address 7fdbc8a8
反汇编最后的PC地址:
PC is at netif_skb_features+0xcc/0x214
这个函数是在net/core/dev.c
$ arm-openwrt-linux-objdump -S net/core/dev.o > dev.o.objdump
if (dev->netdev_ops->ndo_features_check)
c8: e591312c ldr r3, [r1, #300] ; 0x12c
cc: e59370e0 ldr r7, [r3, #224] ; 0xe0
d0: e3570000 cmp r7, #0
这里差不多可以看出是dev->netdev_ops这个指针出现了问题。
然后又有一个信息,
last unloaded: passthrough
7fdbc8a8 这个地址看起是内核模块的加载地址段,又复现一次
$ cat /proc/modules
passthrough 7771 0 - Live 0x7fdbd000
Unable to handle kernel paging request at virtual address 7fdbe5e0
可以看到内核模块的地址范围是 0x7fdbd000 - (0x7fdbd000+7771)/ 7fdbee5b
7fdbe5e0 这个地址正好是passthrough的地址段。
最后排查passthrough模块的代码,成功解决。