最近在稳定性挂机的时候,单台设备,连接70+无线终端,进行直播播放,挂机2天,出现看门狗重启。
[20230826_20:52:09_155]: [263114.367975] Watchdog bark! Now = 263114.367962 [20230826_20:52:09_155]: [263114.367999] Causing a watchdog bite! [20230826_20:52:09_162]: [263114.371303] Configuring Watchdog Timer [20230826_20:52:09_163]: [263114.375118] Waiting for Reboot [20230826_20:52:10_476]: [263114.379769] Wdog - CTL: 0x1, BARK TIME: 0xe2900, BITE TIME: 0x1
openwrt平台,看门狗由1号进程,procd进程维护。源码:procd/watchdog.c
可以通过ubus,设置看门狗的参数,关闭看门狗等。
# ubus -vv list system
'system' @e32153d7
"board":{}
"info":{}
"reboot":{}
"watchdog":{"frequency":"Integer","timeout":"Integer","magicclose":"Boolean","stop":"Boolean"}
"signal":{"pid":"Integer","signum":"Integer"}
"sysupgrade":{"path":"String","prefix":"String","command":"String"}
# ubus call system watchdog
{
"status": "running",
"timeout": 30,
"frequency": 5,
"magicclose": false
}
可以看到,看门狗超时时间为30s,喂狗频率为5s
procd这个package,会编译出init、procd、askfirst、reload_config等进程
/sbin/init
源码:
initd/init.c
initd/early.c
initd/preinit.c
initd/mkdev.c
sysupgrade.c
watchdog.c
utils/utils.c
procd源码
procd.c
signal.c
state.c
inittab.c
rcS.c
ubus.c
system.c
sysupgrade.c
service/service.c
service/instance.c
service/validate.c
service/trigger.c
service/watch.c
utils/utils.c
watchdog.c
plug/coldplug.c
plug/hotplug.c
init任务
内核第一个拉起的进程是/sbin/init,它的main函数在,initd/init.c
-> early () 挂载各种必要文件系统/dev/ /tmp /sys /proc等等
-> watchdog_init(1) 初始化看门狗
执行 kmodloader /etc/modules-boot.d/ 安装启动阶段内核模块
-> preinit()
fork后台执行procd -h /etc/hotplug-preinit.json
fork后台执行 /etc/preinit,等待完成后执行:spawn_procd(),使用procd替换自己
spawn_procd()
得到看门狗的fd,保存到WDTFD环境变量。这样看门狗的fd就继承给procd了。
procd任务
它有个状态轮转,一个任务执行完后,调用procd_state_next()执行下一个
enum {
STATE_NONE = 0,
STATE_EARLY,
STATE_UBUS,
STATE_INIT,
STATE_RUNNING,
STATE_SHUTDOWN,
STATE_HALT,
__STATE_MAX,
};
STATE_EARLY
初始化看门狗,
初始化hotplug,hotplug("/etc/hotplug.json")。创建套接字,接收内核消息,然后执行/etc/hotplug.json指定的规则。
调用procd_coldplug()。调用"udevtrigger"。调用完毕后,进入下一个state。
STATE_UBUS
连接到ubus,连接成功才走下一个state。注册两个object,system和service。
procd的调试打开方法:
读取DBGLVL环境变量,读取 -d选项。
init进程,如果debug大于0,则会设置DBGLVL环境变量。
check_dbglvl() 会读取 /tmp/debug_level, 设置debug。
lib/preinit/30_failsafe_wait
启动过程中,按1-4,设置debug到/tmp/debug_level。
Press the [f] key and hit [enter] to enter failsafe mode
Press the [1], [2], [3] or [4] key and hit [enter] to select the debug level
在syslog就可以看procd的日志了:
2023-05-05T10:30:34+08:00 XiaoQiang procd: Launched hotplug exec instance, pid=1027
2023-05-05T10:30:34+08:00 XiaoQiang procd: Command: exec
2023-05-05T10:30:34+08:00 XiaoQiang procd: /sbin/hotplug-call
2023-05-05T10:30:34+08:00 XiaoQiang procd: button
2023-05-05T10:30:34+08:00 XiaoQiang procd: Message:
2023-05-05T10:30:34+08:00 XiaoQiang procd: PATH=/sbin:/bin:/usr/sbin:/usr/bin
2023-05-05T10:30:34+08:00 XiaoQiang procd: SUBSYSTEM=button
2023-05-05T10:30:34+08:00 XiaoQiang procd: ACTION=released
2023-05-05T10:30:34+08:00 XiaoQiang procd: BUTTON=mesh
2023-05-05T10:30:34+08:00 XiaoQiang procd: SEEN=0
2023-05-05T10:30:34+08:00 XiaoQiang procd: SEQNUM=1011
2023-05-05T10:30:34+08:00 XiaoQiang procd: Finished hotplug exec instance, pid=1027
2023-05-05T10:30:34+08:00 XiaoQiang procd: Launched hotplug exec instance, pid=1028
2023-05-05T10:30:34+08:00 XiaoQiang procd: Finished hotplug exec instance, pid=1028
2023-05-05T10:30:35+08:00 XiaoQiang procd: Ping
2023-05-05T10:30:40+08:00 XiaoQiang procd: Ping
ping这个就是喂狗了。
之前一直怀疑可能是cpu占用高,导致procd得不到调度,但是脚本后台起多个死循环:
$ while true; do : ; done &
把cpu吃光,查看procd的日志,仍然非常稳定且精确的喂狗,内核调度器能很公平的调度每个进程,在cpu很忙的时候,不会出现某个进程饿死的情况,饿死30s,更是不可能的。内核调度器还是很ok的。
那怀疑是procd有bug,其执行某一个其它任务阻塞了整个进程,导致喂狗被阻塞了。查看procd的最新提交记录,没有这方面的修复。而且procd使用fork执行hotplug等,因此这个可能性也不大。
查看之前一个类似bug:挂机5小时controller指示灯为橘黄色,串口卡住不能输入只能输出,之后出现看门狗重启。这就出现曙光了,内核一个臭名昭著的问题浮现出来了:This is a bug known since 2007 - see System freeze on high memory usage.
当内存紧张到低于内存的水线时,内核就freeze了,串口响应极慢,进程得不到调度,内核在忙着回收buffer,cache中的内存。内核的oom触发得极慢,在oom触发前,看门狗bark了,设备重启。
于是尝试,制造内存紧张:
1 调高内存watermark
echo 16000 > /proc/sys/vm/min_free_kbytes
调整为16M,之前为4M
2 往tmp里面写入大文件
dd if=/dev/zero of=8M bs=1024 count=8192
不停在/tmp生成大文件,直到串口出现卡顿,很快在内存紧张的时候,就复现问题了,watchdog bark重启了。
重启后,尝试关闭看门狗,
ubus call system watchdog '{"stop":true,"magicclose":true}'
再次调高水线,制造内存紧张的情况,发现串口变得及其卡顿,但是系统没有重启,很久之后,才触发oom
[ 2054.807162] startscene_cron invoked oom-killer: gfp_mask=0x2420848, order=0, oom_score_adj=0
[ 2054.807195] startscene_cron cpuset=/ mems_allowed=0
[ 2054.818481] CPU: 0 PID: 3574 Comm: startscene_cron Tainted: P 4.4.60 #0
[ 2054.819522] Hardware name: Generic DT based system
[ 2054.827268] [<8121e540>] (unwind_backtrace) from [<8121b6a0>] (show_stack+0x10/0x14)
[ 2054.832035] [<8121b6a0>] (show_stack) from [<81394920>] (dump_stack+0x7c/0x9c)
[ 2054.839933] [<81394920>] (dump_stack) from [<81293578>] (dump_header+0x4c/0x19c)
[ 2054.846962] [<81293578>] (dump_header) from [<81293a78>] (oom_kill_process+0x80/0x41c)
[ 2054.854510] [<81293a78>] (oom_kill_process) from [<812940e8>] (out_of_memory+0x26c/0x300)
[ 2054.862237] [<812940e8>] (out_of_memory) from [<81297af4>] (__alloc_pages_nodemask+0x6bc/0x774)
[ 2054.870484] [<81297af4>] (__alloc_pages_nodemask) from [<81290a54>] (pagecache_get_page+0x120/0x1c0)
[ 2054.878994] [<81290a54>] (pagecache_get_page) from [<812f3040>] (__getblk_slow+0xac/0x24c)
[ 2054.888372] [<812f3040>] (__getblk_slow) from [<813164e4>] (squashfs_read_data+0x174/0xb9c)
[ 2054.896441] [<813164e4>] (squashfs_read_data) from [<813170b0>] (squashfs_cache_get+0x1a4/0x34c)
[ 2054.904687] [<813170b0>] (squashfs_cache_get) from [<81318810>] (squashfs_readpage+0x700/0x950)
[ 2054.913718] [<81318810>] (squashfs_readpage) from [<8129b7b0>] (__do_page_cache_readahead+0x1bc/0x1fc)
[ 2054.922134] [<8129b7b0>] (__do_page_cache_readahead) from [<81292a84>] (filemap_fault+0x194/0x440)
[ 2054.931508] [<81292a84>] (filemap_fault) from [<812ad9d8>] (__do_fault+0x3c/0x94)
[ 2054.940450] [<812ad9d8>] (__do_fault) from [<812b0328>] (handle_mm_fault+0x598/0xfbc)
[ 2054.948001] [<812b0328>] (handle_mm_fault) from [<8121ffcc>] (do_page_fault+0x12c/0x288)
[ 2054.955813] [<8121ffcc>] (do_page_fault) from [<812092c4>] (do_PrefetchAbort+0x34/0x98)
[ 2054.963972] [<812092c4>] (do_PrefetchAbort) from [<8120a95c>] (ret_from_exception+0x0/0x24)
[ 2054.971693] Exception stack(0x853cdfb0 to 0x853cdff8)
[ 2054.980024] dfa0: 00000001 7ecf9f65 00000000 7ecf9f64
[ 2054.985238] dfc0: 7ecf9f63 76f331e0 00000002 7ecf9e44 00000003 00000001 7ecf9df4 00000002
[ 2054.993397] dfe0: 7ecf9f68 7ecf9b60 76e89c68 76e89c68 60000010 ffffffff
[ 2055.015883] Mem-Info:
[ 2055.015932] active_anon:2533 inactive_anon:12553 isolated_anon:0
[ 2055.015932] active_file:336 inactive_file:343 isolated_file:0
[ 2055.015932] unevictable:721 dirty:0 writeback:0 unstable:0
[ 2055.015932] slab_reclaimable:669 slab_unreclaimable:13325
[ 2055.015932] mapped:86 shmem:12557 pagetables:329 bounce:0
[ 2055.015932] free:5098 free_pcp:97 free_cma:0
[ 2055.028813] Normal free:20392kB min:16000kB low:20000kB high:24000kB active_anon:10132kB inactive_anon:50212kB active_file:1380kB inactive_file:1408kB unevictable:2884kB isolated(anon):0kB isolated(file):0kB present:196608kB managed:187952kB mlocked:0kB dirty:0kB writeback:0kB mapped:344kB shmem:50228kB slab_reclaimable:2676kB slab_unreclaimable:53300kB kernel_stack:1464kB pagetables:1316kB unstable:0kB bounce:0kB free_pcp:328kB local_pcp:264kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[ 2055.074293] lowmem_reserve[]: 0 0
[ 2055.098642] Normal: 146*4kB (UMEH) 85*8kB (UME) 50*16kB (UMEH) 24*32kB (UMEH) 6*64kB (UME) 39*128kB (UMH) 2*256kB (M) 3*512kB (UME) 2*1024kB (UE) 2*2048kB (ME) 1*4096kB (M) = 20496kB
[ 2055.113490] 13991 total pagecache pages
[ 2055.114545] 49152 pages RAM
[ 2055.118777] 0 pages HighMem/MovableOnly
[ 2055.121138] 2164 pages reserved
[ 2055.125131] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
[ 2055.128681] [ 350] 0 350 261 19 4 0 0 0 ubusd
[ 2055.137793] [ 351] 0 351 321 12 3 0 0 0 ash
[ 2055.145934] [ 389] 0 389 182 9 3 0 0 -1000 qrtr-ns
[ 2055.154450] [ 660] 0 660 296 10 5 0 0 0 led_srv
[ 2055.163069] [ 719] 0 719 327 12 4 0 0 0 rpcd
[ 2055.875997] Out of memory: Kill process 1777 (messagingagent) score 13 or sacrifice child
[ 2055.884469] Killed process 1777 (messagingagent) total-vm:8044kB, anon-rss:2544kB, file-rss:0kB
但是系统还是在运行,在卡顿的串口,挣扎着删除大文件,系统就恢复了。
参考链接5,有讲。进程是file map加载到内核的:
Active(file): 13896 kB
Inactive(file): 1552 kB
当进程紧张的时候,内核会reclaim active file,导致进程的可执行代码页缺页。因此进程会运行十分缓慢。但是内核运行不受影响。因此可以看到,串口的wifi打印很舒畅,oom的输出很顺畅,但是串口交互很卡顿,因为串口交互是用户态shell进程。
参考链接5,也提供了一个patch,不回收可执行代码页。效果还可以。
【1】https://superuser.com/questions/1281118/linux-hangs-when-it-runs-on-out-of-memory
【2】https://github.com/hakavlad/nohang
【3】https://github.com/rfjakob/earlyoom
【4】https://serverfault.com/questions/390623/how-do-i-prevent-linux-from-freezing-when-out-of-memory