记录一次ipv4_forward被修改导致的生产事故

生产集群的节点内核模块被异常修改,导致集群服务与服务之间网络通信异常,产生了较大规模的生产事故。

此次事故涉及到两个主要的内核模块被修改:

  1. net.ipv4.ip_forward: 用于启用 IP 转发,当此模块加载时,Linux 内核会允许将数据包转发到其他网络

尝试复现

在一个集群中启用两个 Pod,通过这两个 Pod 模拟业务
测试连通性的pod

修改 controller-node-2 节点的 /etc/sysctl.d/99-sysctl.conf 文件,并加载(sysctl -p)
修改节点内核配置

1
net.ipv4.ip_forward=0

此时再去测试连通性,已经不通了
修改节点内核参数后测试连通性

尽管是在同一个宿主机上的 Pod,也无法进行通信
同一宿主机的pod进行通信

节点之间能够正常通信
节点之间通信

查看 calico 组网状态,显示正常
修改节点内核参数后calico组网状态

通过正常节点的 Pod 去 ping 异常节点的 Pod,正常节点抓包,发现没有回包

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[root@controller-node-1 ~]# tcpdump -i any host 10.233.74.83 or 10.233.76.142 -nnvvv
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
16:14:10.762727 IP (tos 0x0, ttl 64, id 37977, offset 0, flags [DF], proto ICMP (1), length 84)
10.233.74.83 > 10.233.76.142: ICMP echo request, id 265, seq 0, length 64
16:14:10.762787 IP (tos 0x0, ttl 63, id 37977, offset 0, flags [DF], proto ICMP (1), length 84)
10.233.74.83 > 10.233.76.142: ICMP echo request, id 265, seq 0, length 64
16:14:11.762953 IP (tos 0x0, ttl 64, id 38291, offset 0, flags [DF], proto ICMP (1), length 84)
10.233.74.83 > 10.233.76.142: ICMP echo request, id 265, seq 1, length 64
16:14:11.763002 IP (tos 0x0, ttl 63, id 38291, offset 0, flags [DF], proto ICMP (1), length 84)
10.233.74.83 > 10.233.76.142: ICMP echo request, id 265, seq 1, length 64
16:14:12.763208 IP (tos 0x0, ttl 64, id 38345, offset 0, flags [DF], proto ICMP (1), length 84)
10.233.74.83 > 10.233.76.142: ICMP echo request, id 265, seq 2, length 64
16:14:12.763253 IP (tos 0x0, ttl 63, id 38345, offset 0, flags [DF], proto ICMP (1), length 84)
10.233.74.83 > 10.233.76.142: ICMP echo request, id 265, seq 2, length 64

异常节点抓包,发现 icmp 包有到达该节点上,但目标地址没有进行响应,说明流量没有抵达目的地 Pod

1
2
3
4
5
6
7
8
[root@controller-node-2 ~]# tcpdump -i any host 10.233.74.83 or 10.233.76.142 -nnvvv
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
16:15:00.019656 IP (tos 0x0, ttl 63, id 42391, offset 0, flags [DF], proto ICMP (1), length 84)
10.233.74.83 > 10.233.76.142: ICMP echo request, id 271, seq 0, length 64
16:15:01.019960 IP (tos 0x0, ttl 63, id 43082, offset 0, flags [DF], proto ICMP (1), length 84)
10.233.74.83 > 10.233.76.142: ICMP echo request, id 271, seq 1, length 64
16:15:02.020072 IP (tos 0x0, ttl 63, id 43594, offset 0, flags [DF], proto ICMP (1), length 84)
10.233.74.83 > 10.233.76.142: ICMP echo request, id 271, seq 2, length 64

通过异常节点的 Pod 去 ping 正常节点的 Pod,正常节点抓包,发现没有任何包,说明流量没有从异常节点转发出来

1
2
[root@controller-node-1 ~]# tcpdump -i any host 10.233.74.83 or 10.233.76.142 -nnvvv
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes

异常节点,进入 Pod 对应的网络命名空间进行抓包,可以看到有 icmp 的请求包,但依旧没有收到响应

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
[root@controller-node-2 ~]# nsenter -n -t 23056
[root@controller-node-2 ~]# ip -4 a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
4: eth0@if12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default qlen 1000 link-netnsid 0
inet 10.233.76.142/32 scope global eth0
valid_lft forever preferred_lft forever
[root@controller-node-2 ~]# tcpdump -i any host 10.233.74.83 or 10.233.76.142 -nnvvv
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
16:22:42.911202 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 169.254.1.1 tell 10.233.76.142, length 28
16:22:43.913783 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 169.254.1.1 tell 10.233.76.142, length 28
16:22:44.915789 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 169.254.1.1 tell 10.233.76.142, length 28
16:22:45.917834 IP (tos 0xc0, ttl 64, id 52805, offset 0, flags [none], proto ICMP (1), length 112)
10.233.76.142 > 10.233.76.142: ICMP host 10.233.74.83 unreachable, length 92
IP (tos 0x0, ttl 64, id 26248, offset 0, flags [DF], proto ICMP (1), length 84)
10.233.76.142 > 10.233.74.83: ICMP echo request, id 90, seq 0, length 64
16:22:45.917839 IP (tos 0xc0, ttl 64, id 52806, offset 0, flags [none], proto ICMP (1), length 112)
10.233.76.142 > 10.233.76.142: ICMP host 10.233.74.83 unreachable, length 92
IP (tos 0x0, ttl 64, id 26601, offset 0, flags [DF], proto ICMP (1), length 84)
10.233.76.142 > 10.233.74.83: ICMP echo request, id 90, seq 1, length 64
16:22:45.917841 IP (tos 0xc0, ttl 64, id 52807, offset 0, flags [none], proto ICMP (1), length 112)
10.233.76.142 > 10.233.76.142: ICMP host 10.233.74.83 unreachable, length 92
IP (tos 0x0, ttl 64, id 26809, offset 0, flags [DF], proto ICMP (1), length 84)
10.233.76.142 > 10.233.74.83: ICMP echo request, id 90, seq 2, length 64

尝试重启该节点的 calico-node,内核模块会被 calico-node 修改回来,此时网络恢复,但 /etc/sysctl.d/99-sysctl.conf 中的 net.ipv4.ip_forward 还是 0,所以在下次重新加载(sysctl -p)的时候,仍然会被设置为关闭状态
重启calico-node后测试连通性
重启calico-node后内核参数

事故总结

此次事故的排障思路是:

  1. 通过两个在不同宿主机的 Pod,测试跨节点的连通性,不通
  2. 测试节点之间的连通性,能够正常通信
  3. 在这两个宿主机进行同一宿主机不同 Pod 的连通性测试,一台通,一台不通 – 确定问题节点
  4. 通过 calicoctl node status 查看组网状态,显示正常 – 暂且排除是 calico 的问题
  5. 通过 tcpdump 进行抓包,获取正常节点 Pod 到异常节点 Pod 的数据包 – icmp 数据包能够到达异常节点,但异常节点的 Pod 没有响应
  6. 通过 tcpdump 进行抓包,获取异常节点 Pod 到正常节点 Pod 的数据包 – 异常节点宿主机层面无法获取 icmp 包,通过 nsenter 进入 Pod 的网络命名空间发现,icmp 有发出但无响应,且 icmp 数据包无法到达正常节点,正常节点抓包观察没有任何包
  7. 尝试重启异常节点 calico-node,网络恢复 – calico-node 启动会修改内核参数,但不会持久化到 /etc/sysctl.d/99-sysctl.conf
  8. 查看 /etc/sysctl.d/99-sysctl.conf 发现 net.ipv4.ip_forward 被设置为了 0
  9. 通过 ansible 检查所有节点的 /etc/sysctl.d/99-sysctl.conf 文件
Author

Warner Chen

Posted on

2024-03-31

Updated on

2024-03-31

Licensed under

You need to set install_url to use ShareThis. Please set it in _config.yml.
You forgot to set the business or currency_code for Paypal. Please set it in _config.yml.

Comments

You forgot to set the shortname for Disqus. Please set it in _config.yml.