ILD

一个rmmod导致内核panic问题分析
作者:Yuan Jianpeng 邮箱:yuanjp89@163.com
发布时间:2023-11-17 站点:Inside Linux Development

卸载模块后,继续访问模块的数据(全局变量、函数等)导致非法内存访问,触发panic,是一种常见的panic场景。本文分析了一种此类panic的解决思路。


pannic日志如下:

Unable to handle kernel paging request at virtual address 7fdbc8a8

pgd = 825dc000

[7fdbc8a8] *pgd=47fb2811, *pte=00000000, *ppte=00000000

Internal error: Oops: 7 [#1] PREEMPT SMP ARM

Modules linked in: ecm ath_pktlog(P) wifi_3_0 qca_ol umac asf(P) qdf mem_manager(P) 。。。 [last unloaded: passthrough]

CPU: 1 PID: 2325 Comm: uci Tainted: P                4.4.60 #0

Hardware name: Generic DT based system

task: 83366d00 ti: 84ef4000 task.ti: 84ef4000

PC is at netif_skb_features+0xcc/0x214

LR is at validate_xmit_skb.part.23+0xc/0x278

pc : [<814dacec>]    lr : [<814dae40>]    psr: 80000113

sp : 84ef5780  ip : 866fa740  fp : 866fa000

r10: 00000000  r9 : 831a5c68  r8 : 00000000

r7 : 00000000  r6 : 825679c0  r5 : 00000000  r4 : 00004000

r3 : 7fdbc7c8  r2 : 0000a888  r1 : 84c8c000  r0 : 00000000

Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user

Control: 10c0383d  Table: 425dc06a  DAC: 00000055

Process uci (pid: 2325, stack limit = 0x84ef4210)

Stack: (0x84ef5780 to 0x84ef6000)

5780: 825679c0 84c8c000 00000301 825679c0 86b6d600 00000000 00000000 00000000

57a0: 831a5c68 814dae40 825679c0 84c8c000 00000301 00000000 86b6d600 00000000

57c0: 00000000 00000000 831a5c68 814db0d0 831a5c00 86b6d600 825679c0 84c8c000

。。。

[<814dacec>] (netif_skb_features) from [<814dae40>] (validate_xmit_skb.part.23+0xc/0x278)

[<814dae40>] (validate_xmit_skb.part.23) from [<814db0d0>] (validate_xmit_skb_list+0x24/0x4c)

[<814db0d0>] (validate_xmit_skb_list) from [<814f5000>] (sch_direct_xmit+0x3c/0x1d8)

[<814f5000>] (sch_direct_xmit) from [<814db704>] (__dev_queue_xmit+0x238/0x484)

[<814db704>] (__dev_queue_xmit) from [<815ade38>] (br_dev_queue_push_xmit+0x128/0x140)

[<815ade38>] (br_dev_queue_push_xmit) from [<815ba8e8>] (br_nf_post_routing+0x204/0x284)

[<815ba8e8>] (br_nf_post_routing) from [<81500c60>] (nf_iterate+0x54/0x78)

[<81500c60>] (nf_iterate) from [<81500cb4>] (nf_hook_slow+0x30/0xb4)

[<81500cb4>] (nf_hook_slow) from [<815adedc>] (br_forward_finish+0x8c/0xa0)

[<815adedc>] (br_forward_finish) from [<815baa80>] (br_nf_forward_finish+0x118/0x178)

[<815baa80>] (br_nf_forward_finish) from [<815bafac>] (br_nf_forward_ip+0x2f4/0x380)

[<815bafac>] (br_nf_forward_ip) from [<81500c60>] (nf_iterate+0x54/0x78)

[<81500c60>] (nf_iterate) from [<81500cb4>] (nf_hook_slow+0x30/0xb4)

[<81500cb4>] (nf_hook_slow) from [<815ae134>] (__br_forward+0xe4/0x100)

[<815ae134>] (__br_forward) from [<815ae044>] (deliver_clone+0x40/0x4c)

[<815ae044>] (deliver_clone) from [<815ae268>] (maybe_deliver+0x78/0x88)

[<815ae268>] (maybe_deliver) from [<815ae2fc>] (br_flood+0x84/0xdc)

[<815ae2fc>] (br_flood) from [<815ae5ac>] (br_flood_forward+0x14/0x20)

[<815ae5ac>] (br_flood_forward) from [<815af8ac>] (br_handle_frame_finish+0x57c/0x5b0)

[<815af8ac>] (br_handle_frame_finish) from [<815bb1f0>] (br_nf_pre_routing_finish+0x1b8/0x348)

[<815bb1f0>] (br_nf_pre_routing_finish) from [<815bb708>] (br_nf_pre_routing+0x2dc/0x354)

[<815bb708>] (br_nf_pre_routing) from [<81500c60>] (nf_iterate+0x54/0x78)

[<81500c60>] (nf_iterate) from [<81500cb4>] (nf_hook_slow+0x30/0xb4)

[<81500cb4>] (nf_hook_slow) from [<815afc58>] (br_handle_frame+0x378/0x3c0)

[<815afc58>] (br_handle_frame) from [<814d756c>] (__netif_receive_skb_core+0x414/0x7dc)

[<814d756c>] (__netif_receive_skb_core) from [<814d8e30>] (netif_receive_skb_internal+0x60/0xac)

[<814d8e30>] (netif_receive_skb_internal) from [<814d956c>] (napi_gro_receive+0x48/0xc4)

[<814d956c>] (napi_gro_receive) from [<7f20a768>] (nss_core_send_buffer+0x2528/0x27a8 [qca_nss_drv])

[<7f20a768>] (nss_core_send_buffer [qca_nss_drv]) from [<7f20aa04>] (nss_core_handle_napi_queue+0x1c/0x44 [qca_nss_drv])

[<7f20aa04>] (nss_core_handle_napi_queue [qca_nss_drv]) from [<814d9bf4>] (net_rx_action+0xe0/0x28c)

[<814d9bf4>] (net_rx_action) from [<812280c0>] (__do_softirq+0xd0/0x200)

[<812280c0>] (__do_softirq) from [<81228464>] (irq_exit+0x84/0xf4)

[<81228464>] (irq_exit) from [<8125ccdc>] (__handle_domain_irq+0x90/0xb4)

[<8125ccdc>] (__handle_domain_irq) from [<81209378>] (gic_handle_irq+0x50/0x94)

[<81209378>] (gic_handle_irq) from [<8120a480>] (__irq_svc+0x40/0x74)

Exception stack(0x84ef5da8 to 0x84ef5df0)

5da0:                   8f82a000 00000000 0000003f 8fdf1740 8fdf1734 8f82a000

5dc0: 8fdf1734 76fbd000 84ec46f4 83b7f600 827484d0 76fb0000 8648f880 84ef5df8

5de0: 812c2638 812b6f80 20000113 ffffffff

[<8120a480>] (__irq_svc) from [<812b6f80>] (page_add_file_rmap+0x74/0x80)

[<812b6f80>] (page_add_file_rmap) from [<812afd48>] (do_set_pte+0x98/0xf0)

[<812afd48>] (do_set_pte) from [<81290ef0>] (filemap_map_pages+0x1d8/0x258)

[<81290ef0>] (filemap_map_pages) from [<812b02f8>] (handle_mm_fault+0x558/0xfbc)

[<812b02f8>] (handle_mm_fault) from [<8121ffcc>] (do_page_fault+0x12c/0x288)

[<8121ffcc>] (do_page_fault) from [<812092c4>] (do_PrefetchAbort+0x34/0x98)

[<812092c4>] (do_PrefetchAbort) from [<8120a95c>] (ret_from_exception+0x0/0x24)

Exception stack(0x84ef5fb0 to 0x84ef5ff8)

5fa0:                                     76f52000 7ee039c0 76fb15e8 76f52000

5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

5fe0: 76f52000 7ee039c0 76faf140 76fb15e8 60000010 ffffffff

Code: e3a03000 e1923003 13c44012 e591312c (e59370e0) 

---[ end trace aed86ced24643538 ]---

Trying to store crash panic, reason = 2 vs 1 

Kernel panic - not syncing: Fatal exception in interrupt

CPU0: stopping

CPU: 0 PID: 1599 Comm: syslog-ng Tainted: P      D         4.4.60 #0

Hardware name: Generic DT based system

[<8121e540>] (unwind_backtrace) from [<8121b6a0>] (show_stack+0x10/0x14)

[<8121b6a0>] (show_stack) from [<81395638>] (dump_stack+0x7c/0x9c)

[<81395638>] (dump_stack) from [<8121d95c>] (handle_IPI+0xe8/0x180)

[<8121d95c>] (handle_IPI) from [<812093a0>] (gic_handle_irq+0x78/0x94)

[<812093a0>] (gic_handle_irq) from [<8120a744>] (__irq_usr+0x44/0x60)

Exception stack(0x832bbfb0 to 0x832bbff8)

bfa0:                                     00000000 555ea7b4 00000000 555ea6e0

bfc0: 555ea6b0 555ea6ec 76e84508 7ea8b970 00000001 00000000 7ea8b9b4 76e58070

bfe0: 00000018 7ea8b948 76e84448 76e84290 60000010 ffffffff

bt_driver 1943008.bt: IPQ crashed, inform BT to prepare dump

bt_driver 1943008.bt: BT IPC not initialized, no message sent

Trying to store crash syslog ! 

Trying to store crash panic, reason = 1 vs 1 

 get crash_buf len 15745 vs 524260, 524288  

Rebooting in 3 seconds..


可以看到是访问了非法内存:

Unable to handle kernel paging request at virtual address 7fdbc8a8


反汇编最后的PC地址:

PC is at netif_skb_features+0xcc/0x214

这个函数是在net/core/dev.c

$ arm-openwrt-linux-objdump -S net/core/dev.o  > dev.o.objdump

if (dev->netdev_ops->ndo_features_check)

  c8: e591312c ldr r3, [r1, #300] ; 0x12c

  cc: e59370e0 ldr r7, [r3, #224] ; 0xe0

  d0: e3570000 cmp r7, #0

这里差不多可以看出是dev->netdev_ops这个指针出现了问题。


然后又有一个信息,

last unloaded: passthrough

7fdbc8a8 这个地址看起是内核模块的加载地址段,又复现一次

$ cat /proc/modules 

passthrough 7771 0 - Live 0x7fdbd000

Unable to handle kernel paging request at virtual address 7fdbe5e0

可以看到内核模块的地址范围是 0x7fdbd000 - (0x7fdbd000+7771)/ 7fdbee5b

7fdbe5e0 这个地址正好是passthrough的地址段。


最后排查passthrough模块的代码,成功解决。


Copyright © linuxdev.cc 2017-2024. Some Rights Reserved.