Linux Socket Filtering (LSF) 衍生自 Berkeley Packet Filter (BPF).
BPF允许用户程序添加一个filter到任何的socket,来允许或拒绝某些类型的数据包通过socket。LSF和BSD的BPF的过滤代码结构相同。
Linux上BPF比BSD更简单,不需要担心devices或者其它类似的东西。你只要创建filter code,然后通过SO_ATTACH_FILTER选项发送给内核,如果内核通过检查,内核将立即开始执行socket的filtering。
你也可以通过SO_DETACH_FILTER选项来detach filters。这可能不常用,当你关闭socket时,它上面的filter自动移除。另一个不常用的的场景是添加另外一个filter到同一个socket。内核将移除旧的使用新的,如果新的检查的失败,则仍然使用旧的filter。
SO_LOCK_FILTER选项可以锁定一个filter,一旦设置,filter不能移除或改变。
尽管,这里我们只谈论套接字,但是BPF在linux的使用场景还有很多,如xt_buf, cls_bpf等。
用户程序包含<linux/filter.h>。相关结构体有:
1 2 3 4 5 6 | struct sock_filter { /* Filter block */ __u16 code; /* Actual filter code */ __u8 jt; /* Jump true */ __u8 jf; /* Jump false */ __u32 k; /* Generic multiuse field */ }; |
上述结构封装一个4元组,包含code,jt, jf和一个k值。jt和jf是跳转便宜。k是一个通用值。
1 2 3 4 | struct sock_fprog { /* Required for SO_ATTACH_FILTER. */ unsigned short len; /* Number of filter blocks */ struct sock_filter __user *filter; }; |
filter是一个数组,len为数组的个数,filter指向该数组。
例子:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | #include <sys/socket.h> #include <sys/types.h> #include <arpa/inet.h> #include <linux/if_ether.h> /* ... */ /* From the example above: tcpdump -i em1 port 22 -dd */ struct sock_filter code[] = { { 0x28, 0, 0, 0x0000000c }, { 0x15, 0, 8, 0x000086dd }, { 0x30, 0, 0, 0x00000014 }, { 0x15, 2, 0, 0x00000084 }, { 0x15, 1, 0, 0x00000006 }, { 0x15, 0, 17, 0x00000011 }, { 0x28, 0, 0, 0x00000036 }, { 0x15, 14, 0, 0x00000016 }, { 0x28, 0, 0, 0x00000038 }, { 0x15, 12, 13, 0x00000016 }, { 0x15, 0, 12, 0x00000800 }, { 0x30, 0, 0, 0x00000017 }, { 0x15, 2, 0, 0x00000084 }, { 0x15, 1, 0, 0x00000006 }, { 0x15, 0, 8, 0x00000011 }, { 0x28, 0, 0, 0x00000014 }, { 0x45, 6, 0, 0x00001fff }, { 0xb1, 0, 0, 0x0000000e }, { 0x48, 0, 0, 0x0000000e }, { 0x15, 2, 0, 0x00000016 }, { 0x48, 0, 0, 0x00000010 }, { 0x15, 0, 1, 0x00000016 }, { 0x06, 0, 0, 0x0000ffff }, { 0x06, 0, 0, 0x00000000 }, }; struct sock_fprog bpf = { .len = ARRAY_SIZE(code), .filter = code, }; sock = socket(PF_PACKET, SOCK_RAW, htons(ETH_P_ALL)); if (sock < 0) /* ... bail out ... */ ret = setsockopt(sock, SOL_SOCKET, SO_ATTACH_FILTER, &bpf, sizeof (bpf)); if (ret < 0) /* ... bail out ... */ /* ... */ close(sock); |
SO_DETACH_FILTER选项不需要任何参数。SO_LOCK_FILTER带一个参数,值为0或1.
1 2 3 | setsockopt(sockfd, SOL_SOCKET, SO_ATTACH_FILTER, &val, sizeof (val)); setsockopt(sockfd, SOL_SOCKET, SO_DETACH_FILTER, &val, sizeof (val)); setsockopt(sockfd, SOL_SOCKET, SO_LOCK_FILTER, &val, sizeof (val)); |
BPF架构包含下列基本元素:
Element | Description |
A | 32 bit wide accumlator |
X | 32 bit wide X register |
M[] | 16 x 32 bit wide misc register aka scratch memory store. addressable from 0 to 15 |
一个程序,由bpf_asm翻译成opcode数组,包含下列元素:
op: 16, jt: 8, jf:8, k: 32
op是一个16位宽的opcode。
jt和jf是8位宽的跳转目标。一个用于条件为真时的跳转,一个用于条件为假的跳转。
k包含miscellaneous argument。
指令集包括load/store/branch/alu/miscellaneous/return指令。指令对应的操作码定义在linux/filter.h中。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | Instruction Addressing mode Description ld 1, 2, 3, 4, 10 Load word into A ldi 4 Load word into A ldh 1, 2 Load half-word into A ldb 1, 2 Load byte into A ldx 3, 4, 5, 10 Load word into X ldxi 4 Load word into X ldxb 5 Load byte into X st 3 Store A into M[] stx 3 Store X into M[] jmp 6 Jump to label ja 6 Jump to label jeq 7, 8 Jump on A == k jneq 8 Jump on A != k jne 8 Jump on A != k jlt 8 Jump on A < k jle 8 Jump on A <= k jgt 7, 8 Jump on A > k jge 7, 8 Jump on A >= k jset 7, 8 Jump on A & k add 0, 4 A + <x> sub 0, 4 A - <x> mul 0, 4 A * <x> div 0, 4 A / <x> mod 0, 4 A % <x> neg !A and 0, 4 A & <x> or 0, 4 A | <x> xor 0, 4 A ^ <x> lsh 0, 4 A << <x> rsh 0, 4 A >> <x> tax Copy A into X txa Copy X into A ret 4, 9 Return |
寻址模式表:
1 2 3 4 5 6 7 8 9 10 11 12 13 | Addressing mode Syntax Description 0 x/%x Register X 1 [k] BHW at byte offset k in the packet 2 [x + k] BHW at the offset X + k in the packet 3 M[k] Word at offset k in M[] 4 #k Literal value stored in k 5 4*([k]&0xf) Lower nibble * 4 at byte offset k in the packet 6 L Jump label L 7 #k,Lt,Lf Jump to Lt if true, otherwise jump to Lf 8 #k,Lt Jump to Lt if predicate is true 9 a/%a Accumulator A 10 extension BPF extension |
Linux也包含一些BPF extensions,用于load指令,将这些扩展内容加载到寄存器。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | Extension Description len skb->len proto skb->protocol type skb->pkt_type poff Payload start offset ifidx skb->dev->ifindex nla Netlink attribute of type X with offset A nlan Nested Netlink attribute of type X with offset A mark skb->mark queue skb->queue_mapping hatype skb->dev->type rxhash skb->hash cpu raw_smp_processor_id() vlan_tci skb_vlan_tag_get(skb) vlan_avail skb_vlan_tag_present(skb) vlan_tpid skb->vlan_proto rand prandom_u32() |
例如:
ld vlan_tci
将skb_vlan_tag_get(skb)存入A寄存器。
需要一个例子:
1 2 3 4 | ldh [12] 将数据的包的12偏移处的16个字节载入到A jne #0x806, drop 如果A不等于0x0806,则跳转到drop处执行,否则执行下一条 ret #-1 drop: ret #0 |
将其保存到一个文件,然后编译:
1 2 | $ ./bpf_asm foo 4,40 0 0 12,21 0 1 2054,6 0 0 4294967295,6 0 0 0, |
还可以生成c语言格式:
1 2 3 4 5 | $ ./bpf_asm -c foo { 0x28, 0, 0, 0x0000000c }, { 0x15, 0, 1, 0x00000806 }, { 0x06, 0, 0, 0xffffffff }, { 0x06, 0, 0, 0000000000 }, |
可以使用bpf_dbg来检测过滤器,具体不学习。
略
出于性能考虑,内核内部的BPF指令集和底层架构指令集相似。这个新的指令集叫做eBPF或者 internal BPF
参考
https://www.kernel.org/doc/Documentation/networking/filter.txt
https://github.com/cloudflare/bpftools