ILD

rmmod失败定位过程
作者:Yuan Jianpeng 邮箱:yuanjp89@163.com
发布时间:2019-4-20 站点:Inside Linux Development

想要卸载ath9k,但是失败,发现其refcnt为1,但是没有内核模块引用它。


1
2
3
4
5
root@OpenWrt:/# lsmod
ath                    18771  4 ath9k,ath9k_common,ath9k_hw,ath10k_core
ath10k_core           284008  1 ath10k_pci
ath10k_pci             33859  0 
ath9k                  98779  1

查阅资料发现是内核本身的函数引用了这个内核模块。引用内核模块的接口为:

try_module_get和module_put


修改kernel/module.c中的这两个函数,打印出函数栈

1
2
3
4
5
6
7
8
9
10
11
12
13
bool try_module_get(struct module *module)
{
    bool ret = true;
    char symname[KSYM_NAME_LEN];                                                                                                                                                                                    
 
    if (module) {
        if (!strcmp(module->name, "ath9k")) {
            lookup_symbol_name((unsigned long)_RET_IP_, symname);
            dump_stack();
            if (!strcmp(symname, "gpiod_request"))
                return true;
        }   
        。。。


如下,返回函数的地址为_RET_IP_,通过lookup_symbol_name,可以得到调用者的名字,dump_stack()可以打印出异常栈。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
[   12.963471] CPU: 0 PID: 422 Comm: kmodloader Not tainted 4.9.123 #22
[   12.970091] Stack : 804e7672 00000038 00000000 00000000 83891c7c 8046f247 80422cd4 000001a6
[   12.978781]         804e37c0 00000000 80470000 00000004 839cc010 800ae18c 83bbd8f8 83bbd8f8
[   12.987489]         8043750c 00000800 80426ad4 83bbd93c 80441a7c 800e0ed8 00000000 801e6bb8
[   12.996190]         82de80ff 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   13.004890]         00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   13.013590]         ...
[   13.016127] Call Trace:
[   13.018656] [<8006bd78>] show_stack+0x54/0x88
[   13.023176] [<800ce3a4>] try_module_get+0x88/0x120
[   13.028145] [<80204e44>] gpiod_request+0x98/0xf8
[   13.033032] [<82c211c8>] ath9k_beacon_config+0x3b0/0x648 [ath9k]
[   13.039281] [<801e4204>] snprintf+0x1c/0x28
[   13.043661] [<82c21910>] ath_init_leds+0x2c8/0x30c [ath9k]
[   13.049405] [<82c22a64>] ath9k_init_device+0x9a4/0xa3c [ath9k]
[   13.055512] [<82c2ee98>] ath_pci_exit+0x25c/0x30c [ath9k]
[   13.061144] try_module_get ath9k gpiod_request


最后发现是gpiod_request这个接口请求,没有释放。查看驱动源码,发现释放是在卸载驱动的时候,但是有引用计数又不能卸载,这就死锁了。


修改成如上,直接返回,不增加计数,编译内核,写到板子上,重启,发现计数变成0了,可以卸载:

1
2
3
4
5
root@OpenWrt:/# lsmod
ath                    18771  4 ath9k,ath9k_common,ath9k_hw,ath10k_core
ath10k_core           284008  1 ath10k_pci
ath10k_pci             33859  0 
ath9k                  98779  0


但是内核给出告警了,因为没有增加计数,卸载时却减少计数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
root@OpenWrt:/# rmmod ath9k
[   53.480231] ------------[ cut here ]------------
[   53.485026] WARNING: CPU: 0 PID: 1450 at /work/k2t/k2t-mesh/linux-4.9.123/kernel/module.c:1120 module_put+0x64/0xec
[   53.495846] Modules linked in: ath9k(-) ath9k_common pppoe ppp_async ath9k_hw ath10k_pci ath10k_core ath pppox ppp_generic nf_conntrack_ipv6 mac80211 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack iptable_mangle iptable_filter ip_tables crc_ccitt compat ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables gpio_button_hotplug
[   53.557476] CPU: 0 PID: 1450 Comm: rmmod Not tainted 4.9.123 #24
[   53.563692] Stack : 804e7672 00000034 00000000 00000000 82ae07bc 8046f247 80422cd4 000005aa
[   53.572392]         804e37c0 00000460 7701a000 00000000 00000000 800ae18c 00000003 80470000
[   53.581092]         80428bb8 00000460 80426ad4 82a4bc64 00000000 800e0ef8 804e7672 00000067
[   53.589792]         00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   53.598483]         00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   53.607184]         ...
[   53.609729] Call Trace:
[   53.612257] [<8006bd78>] show_stack+0x54/0x88
[   53.616772] [<80081bb8>] __warn+0xe4/0x118
[   53.621010] [<80081c7c>] warn_slowpath_null+0x1c/0x30
[   53.626234] [<800ce4bc>] module_put+0x64/0xec
[   53.630757] [<80204f00>] gpiod_free+0x3c/0x68
[   53.635275] [<833e15b0>] ath_deinit_leds+0x74/0x10c [ath9k]
[   53.641042] [<833e2b2c>] ath9k_deinit_device+0x30/0x990 [ath9k]
[   53.647158] [<833eec80>] ath_pci_exit+0x44/0x30c [ath9k]
[   53.652663] ---[ end trace c16c94bd8f41a14b ]---
[   53.657428] module_Put ath9k gpiod_free
[   53.682849] ath9k: ath9k: Driver unloaded


参考:

http://haneensa.github.io/2018/06/13/kerneldebug2/

https://stackoverflow.com/questions/448999/is-there-a-way-to-figure-out-what-is-using-a-linux-kernel-module/449856

Copyright © linuxdev.cc 2017-2024. Some Rights Reserved.