kubeadm 使用随记

节点规划

Role Hostname IP OS Kernal
Control Plane + ETCD test-0 172.16.16.140 Ubuntu 22.04.5 LTS 5.15.0-125-generic
Control Plane + ETCD test-1 172.16.16.141 Ubuntu 22.04.5 LTS 5.15.0-125-generic
Control Plane + ETCD test-2 172.16.16.142 Ubuntu 22.04.5 LTS 5.15.0-125-generic
Worker test-3 172.16.16.143 Ubuntu 22.04.5 LTS 5.15.0-125-generic

环境规划

Component Version
Kubernetes v1.32.13
Containerd v2.1.7
Runc v1.4.2
Crictl v1.32.13
CNI Plugins v1.9.1
Nerdctl v2.2.2
Calico v3.27.0
Cluster CIDR Pod CIDR
10.43.0.0/16 10.42.0.0/16

节点初始化配置


设置时区

1
timedatectl set-timezone Asia/Shanghai

设置时钟

1
2
3
4
5
6
7
8
apt -y install chrony

vim /etc/chrony/chrony.conf

# 注释原有时钟服务器配置,添加如下配置
server ntp.aliyun.com minpoll 4 maxpoll 10 iburst

systemctl enable chronyd.service --now

验证时钟是否同步:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
root@test-0:~# chronyc tracking
Reference ID : CB6B0658 (203.107.6.88)
Stratum : 3
Ref time (UTC) : Wed Apr 15 03:41:43 2026
System time : 0.000027167 seconds fast of NTP time
Last offset : +0.000345476 seconds
RMS offset : 0.001053186 seconds
Frequency : 13.992 ppm fast
Residual freq : +0.245 ppm
Skew : 35.316 ppm
Root delay : 0.053283855 seconds
Root dispersion : 0.002438692 seconds
Update interval : 16.4 seconds
Leap status : Normal

root@test-0:~# chronyc sources -v

.-- Source mode '^' = server, '=' = peer, '#' = local clock.
/ .- Source state '*' = current best, '+' = combined, '-' = not combined,
| / 'x' = may be in error, '~' = too variable, '?' = unusable.
|| .- xxxx [ yyyy ] +/- zzzz
|| Reachability register (octal) -. | xxxx = adjusted offset,
|| Log2(Polling interval) --. | | yyyy = measured offset,
|| \ | | zzzz = estimated error.
|| | | \
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^* 203.107.6.88 2 4 377 1 +359us[ +412us] +/- 34ms

关闭 Swap

1
2
swapoff -a
sed -i '/swap/s/^/#/' /etc/fstab

设置内核参数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
cat <<EOF | tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF

modprobe overlay
modprobe br_netfilter

cat <<EOF | tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables=1
net.bridge.bridge-nf-call-ip6tables=1
net.ipv4.ip_forward=1
EOF

sysctl --system

容器运行时环境配置


安装 Containerd

根据 Support Matrix,安装兼容的 Containerd 版本:https://containerd.io/releases/#kubernetes-support

1
2
3
4
5
6
7
8
9
10
11
12
13
export CONTAINERD_VERSION="2.1.7"

wget "https://github.com/containerd/containerd/releases/download/v$CONTAINERD_VERSION/containerd-$CONTAINERD_VERSION-linux-amd64.tar.gz"

tar Czvxf /usr/local containerd-$CONTAINERD_VERSION-linux-amd64.tar.gz

mkdir -pv /usr/local/lib/systemd/system

wget -P /usr/local/lib/systemd/system "https://raw.githubusercontent.com/containerd/containerd/v$CONTAINERD_VERSION/containerd.service"

systemctl daemon-reload

systemctl enable containerd --now

安装 Runc

1
2
3
4
5
export RUNC_VERSION="v1.4.2"

wget -O /usr/local/bin/runc "https://github.com/opencontainers/runc/releases/download/$RUNC_VERSION/runc.amd64"

chmod 755 /usr/local/bin/runc

安装 CNI 插件

1
2
3
4
5
6
7
export CNI_PLUGIN_VERSION="v1.9.1"

wget "https://github.com/containernetworking/plugins/releases/download/$CNI_PLUGIN_VERSION/cni-plugins-linux-amd64-$CNI_PLUGIN_VERSION.tgz"

mkdir -pv /opt/cni/bin

tar Czvxf /opt/cni/bin cni-plugins-linux-amd64-$CNI_PLUGIN_VERSION.tgz

配置 Containerd

1
2
3
4
5
6
7
8
9
10
mkdir -pv /etc/containerd

containerd config default > /etc/containerd/config.toml

# 将 SystemdCgroup 设置为 true
sed -i 's/^\s*SystemdCgroup\s*=\s*false/SystemdCgroup = true/' /etc/containerd/config.toml

systemctl daemon-reload

systemctl restart containerd

配置 Containerd 使用代理

如果节点需要通过代理才能获取镜像,则需要给 Containerd 配置代理:

1
2
3
4
5
6
7
8
9
10
11
12
mkdir -pv /etc/systemd/system/containerd.service.d

cat <<EOF | tee /etc/systemd/system/containerd.service.d/proxy.conf
[Service]
Environment="HTTP_PROXY=http://172.16.16.12:10808"
Environment="HTTPS_PROXY=http://172.16.16.12:10808"
Environment="NO_PROXY=localhost,127.0.0.1,0.0.0.0,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,10.42.0.0/16,10.43.0.0/16,.svc,.cluster.local,.cn,warnerchen.com"
EOF

systemctl daemon-reload

systemctl restart containerd

安装 Crictl

1
2
3
4
5
6
# 根據 Kubernetes 版本而定
export CRICTL_VERSION="v1.32.0"

wget https://github.com/kubernetes-sigs/cri-tools/releases/download/$CRICTL_VERSION/crictl-$CRICTL_VERSION-linux-amd64.tar.gz

tar Czxvf /usr/local/bin crictl-$CRICTL_VERSION-linux-amd64.tar.gz

安装 Nerdctl(可选)

1
2
3
4
5
export NERDCTL_VERSION="2.2.2"

wget "https://github.com/containerd/nerdctl/releases/download/v$NERDCTL_VERSION/nerdctl-$NERDCTL_VERSION-linux-amd64.tar.gz"

tar Czvxf /usr/local/bin nerdctl-$NERDCTL_VERSION-linux-amd64.tar.gz

安装 Kubernetes


节点安装 kubeadm / kubelet / kubectl(Only for Control Plane)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
apt update

apt install -y apt-transport-https ca-certificates curl

curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.32/deb/Release.key | gpg --dearmor -o /etc/apt/keyrings/kubernetes.gpg

echo 'deb [signed-by=/etc/apt/keyrings/kubernetes.gpg] https://pkgs.k8s.io/core:/stable:/v1.32/deb/ /' | tee /etc/apt/sources.list.d/kubernetes.list

apt update

# 检查版本
apt-cache madison kubelet kubeadm kubectl

apt install -y kubelet kubeadm kubectl

# 锁定安装包,禁止被升级或自动更新
apt-mark hold kubelet kubeadm kubectl

安装第一台 Control Plane 节点

1
2
3
4
5
6
# --control-plane-endpoint 为集群注册地址,后续节点都需要通过这个地址注册,建议使用 DNS / LB,此处仅做测试,使用 Node IP
kubeadm init \
--control-plane-endpoint "172.16.16.140:6443" \
--upload-certs \
--pod-network-cidr=10.42.0.0/16 \
--service-cidr=10.43.0.0/16

初始化 Kubeconfig:

1
2
3
4
5
6
7
8
9
mkdir -p $HOME/.kube

sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config

sudo chown $(id -u):$(id -g) $HOME/.kube/config

echo "source <(kubectl completion bash)" >> ~/.bashrc

source ~/.bashrc

安装 CNI

此处安装 Calico,根据 Support Matrix 安装兼容的版本:https://docs.tigera.io/calico/latest/getting-started/kubernetes/requirements#supported-versions

1
kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.27.0/manifests/calico.yaml

加入第二、三台 Control Plane 节点

1
2
3
kubeadm join 172.16.16.140:6443 --token aaa \
--discovery-token-ca-cert-hash sha256:bbb \
--control-plane --certificate-key ccc

加入 Worker 节点

1
2
kubeadm join 172.16.16.140:6443 --token aaa \
--discovery-token-ca-cert-hash sha256:bbb

所有 Worker 节点安装完成后,为这些节点打 Worker 标签:

1
2
3
4
5
6
7
8
kubectl label node test-3 node-role.kubernetes.io/worker=''

kubectl get nodes
NAME STATUS ROLES AGE VERSION
test-0 Ready control-plane 23m v1.32.13
test-1 Ready control-plane 9m6s v1.32.13
test-2 Ready control-plane 4m10s v1.32.13
test-3 Ready worker 3m8s v1.32.13

ETCD 快照备份与恢复


ETCD 命令初始化

ETCD 相关操作需要使用 etcdctl / etcdutl,可以在 Control Plane 节点执行如下命令获取:

1
2
cp "$(find / -name etcdctl 2>/dev/null | grep -v snapshot | head -n1)" /usr/local/bin/
cp "$(find / -name etcdutl 2>/dev/null | grep -v snapshot | head -n1)" /usr/local/bin/

etcdctl 的执行需要参数指定证书路径和 Endpoints:

1
2
3
4
5
ETCDCTL_API=3 etcdctl member list -w table \
--endpoints=xxx \
--cacert=xxx \
--cert=xxx \
--key=xxx

可以通过 kubectl/etc/kubernetes/manifests/etcd.yaml 获取:

1
2
3
4
5
6
7
8
9
10
11
12
grep 'file' /etc/kubernetes/manifests/etcd.yaml | grep -v 'peer'
- --cert-file=/etc/kubernetes/pki/etcd/server.crt
- --key-file=/etc/kubernetes/pki/etcd/server.key
- --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt

export ETCDCTL_API=3
export ETCDCTL_ENDPOINTS=$(kubectl get nodes -o wide \
| awk '/etcd|control-plane/ {ips[$6]=1} END {for (ip in ips) printf "https://%s:2379,", ip}' \
| sed 's/,$//')
export ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt
export ETCDCTL_CERT=/etc/kubernetes/pki/etcd/server.crt
export ETCDCTL_KEY=/etc/kubernetes/pki/etcd/server.key

ETCD 快照备份

1
2
3
4
5
6
7
8
9
10
11
12
13
14
mkdir pv ~/etcd-backup

export ETCDCTL_API=3
export ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt
export ETCDCTL_CERT=/etc/kubernetes/pki/etcd/server.crt
export ETCDCTL_KEY=/etc/kubernetes/pki/etcd/server.key

# ETCD 快照备份只能选择一个 Endpoint 进行备份
etcdctl snapshot save ~/etcd-backup/etcd-backup-0.db \
--endpoints=https://172.16.16.140:2379

ls -l ~/etcd-backup
total 5968
-rw------- 1 root root 6107168 Apr 15 16:48 etcd-backup-0.db

检查 ETCD 快照状态

1
2
3
4
5
6
7
8
9
10
11
12
export ETCDCTL_API=3
export ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt
export ETCDCTL_CERT=/etc/kubernetes/pki/etcd/server.crt
export ETCDCTL_KEY=/etc/kubernetes/pki/etcd/server.key

# 自 etcd v3.5.x 起,etcdctl snapshot status 的使用已被弃用,并计划在 etcd v3.6 中移除。建议改用 etcdutl
etcdutl snapshot status /root/etcd-backup/etcd-backup-0.db -w table
+----------+----------+------------+------------+
| HASH | REVISION | TOTAL KEYS | TOTAL SIZE |
+----------+----------+------------+------------+
| 348ec281 | 13267 | 2285 | 6.1 MB |
+----------+----------+------------+------------+

ETCD 快照恢复

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
kubectl -n kube-system delete pod etcd-test-0

systemctl stop kubelet

export ETCDCTL_API=3
export ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt
export ETCDCTL_CERT=/etc/kubernetes/pki/etcd/server.crt
export ETCDCTL_KEY=/etc/kubernetes/pki/etcd/server.key

mkdir -pv /var/lib/etcd-restore

etcdutl --data-dir="/var/lib/etcd-restore" snapshot restore ~/etcd-backup/etcd-backup-0.db

# 将挂载主机的 /var/lib/etcd 全部修改为 /var/lib/etcd-restore
vim /etc/kubernetes/manifests/etcd.yaml

systemctl start kubelet

升级 Kubernetes

升级至 v1.33 版本,在所有节点执行:

1
2
3
4
5
6
7
8
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.33/deb/Release.key | gpg --dearmor -o /etc/apt/keyrings/kubernetes.gpg

echo 'deb [signed-by=/etc/apt/keyrings/kubernetes.gpg] https://pkgs.k8s.io/core:/stable:/v1.33/deb/ /' | tee /etc/apt/sources.list.d/kubernetes.list

apt update

# 检查是否有 v1.33 版本
apt-cache madison kubeadm kubelet kubectl

升级第一台 Control Plane 节点

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# 升级 kubeadm
apt-mark unhold kubeadm
apt-get update && sudo apt-get install -y kubeadm=1.33.10-1.1
apt-mark hold kubeadm

# 测试升级计划,如无问题即可升级
kubeadm upgrade plan

# 执行升级
kubeadm upgrade apply v1.33.10

# 升级 kubelet、kubectl
apt-mark unhold kubelet kubectl
apt-get install -y kubelet=1.33.10-1.1 kubectl=1.33.10-1.1
apt-mark hold kubelet kubectl

# 重启 kubelet
systemctl daemon-reload
systemctl restart kubelet

升级 CNI

此前安装的 Calico 版本兼容 Kubernetes v1.33,所以不做升级。如不兼容,需要手动升级。


升级第二、三台 Control Plane 节点

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# 升级 kubeadm
apt-mark unhold kubeadm
apt-get update && sudo apt-get install -y kubeadm=1.33.10-1.1
apt-mark hold kubeadm

# 执行升级,与第一台不同,需要使用 kubeadm upgrade node,且不需要执行 kubeadm upgrade plan
kubeadm upgrade node

# 升级 kubelet、kubectl
apt-mark unhold kubelet kubectl
apt-get install -y kubelet=1.33.10-1.1 kubectl=1.33.10-1.1
apt-mark hold kubelet kubectl

# 重启 kubelet
systemctl daemon-reload
systemctl restart kubelet

升级 Worker 节点

清空节点上的 Pod:

1
2
# Control Plane 节点执行
kubectl drain test-3 --ignore-daemonsets --delete-emptydir-data

升级 Worker 节点:

1
2
3
4
5
6
7
8
9
10
11
12
# 升级 kubeadm、kubelet
apt-mark unhold kubeadm kubelet
apt-get update
apt-get install -y kubeadm=1.33.10-1.1 kubelet=1.33.10-1.1
apt-mark hold kubeadm kubelet

# 执行升级
kubeadm upgrade node

# 重启 kubelet
systemctl daemon-reload
systemctl restart kubelet

恢复调度:

1
2
# Control Plane 节点执行
kubectl uncordon test-3
Author

Warner Chen

Posted on

2026-04-15

Updated on

2026-04-15

Licensed under

You need to set install_url to use ShareThis. Please set it in _config.yml.
You forgot to set the business or currency_code for Paypal. Please set it in _config.yml.

Comments

You forgot to set the shortname for Disqus. Please set it in _config.yml.
You need to set client_id and slot_id to show this AD unit. Please set it in _config.yml.