当集群中三个 Control Plane 节点的 ETCD 出现 request cluster ID mismatch
问题时,可以保留一个 ETCD 实例通过 --force-new-cluster
参数重建集群,然后再将其他两个节点的 ETCD 实例加入集群。
通过 docker rename 的方式保留第二/三台 Control Plane 节点的 ETCD
1 2
| docker stop etcd docker rename etcd etcd-old
|
备份第一台 Control Plane 节点的 ETCD 启动命令
1 2 3 4
| docker run --rm -v /var/run/docker.sock:/var/run/docker.sock assaflavie/runlike:latest etcd
docker run --name=etcd --hostname=test001 --env=ETCDCTL_API=3 --env=ETCDCTL_CACERT=/etc/kubernetes/ssl/kube-ca.pem --env=ETCDCTL_CERT=/etc/kubernetes/ssl/kube-etcd-172-16-0-106.pem --env=ETCDCTL_KEY=/etc/kubernetes/ssl/kube-etcd-172-16-0-106-key.pem --env=ETCDCTL_ENDPOINTS=https://127.0.0.1:2379 --env=ETCD_UNSUPPORTED_ARCH=x86_64 --volume=/var/lib/etcd:/var/lib/rancher/etcd/:z --volume=/etc/kubernetes:/etc/kubernetes:z --network=host --restart=always --label='io.rancher.rke.container.name=etcd' --runtime=runc --detach=true registry.cn-hangzhou.aliyuncs.com/rancher/mirrored-coreos-etcd:v3.4.15-rancher1 /usr/local/bin/etcd --listen-peer-urls=https://0.0.0.0:2380 --trusted-ca-file=/etc/kubernetes/ssl/kube-ca.pem --peer-trusted-ca-file=/etc/kubernetes/ssl/kube-ca.pem --key-file=/etc/kubernetes/ssl/kube-etcd-172-16-0-106-key.pem --peer-cert-file=/etc/kubernetes/ssl/kube-etcd-172-16-0-106.pem --peer-key-file=/etc/kubernetes/ssl/kube-etcd-172-16-0-106-key.pem --peer-client-cert-auth=true --initial-advertise-peer-urls=https://172.16.0.106:2380 --heartbeat-interval=500 --cert-file=/etc/kubernetes/ssl/kube-etcd-172-16-0-106.pem --advertise-client-urls=https://172.16.0.106:2379 --cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 --initial-cluster=etcd-rke1-server-0=https://172.16.0.106:2380,etcd-rke1-server-1=https://172.16.0.105:2380,etcd-rke1-server-2=https://172.16.0.104:2380 --initial-cluster-state=new --client-cert-auth=true --listen-client-urls=https://0.0.0.0:2379 --initial-cluster-token=etcd-cluster-1 --name=etcd-rke1-server-0 --enable-v2=true --election-timeout=5000 --data-dir=/var/lib/rancher/etcd/
|
停止第一台 Control Plane 节点的 ETCD
1 2
| docker stop etcd docker rename etcd etcd-old
|
修改先前保存的 ETCD 启动命令,在 initial-cluster 参数中删除第二/三台 Control Plane 节点的 ETCD 信息,并在最后添加 --force-new-cluster
参数,然后执行,如果启动后仍然报 request cluster ID mismatch
的错误,可以重复多几次
1
| docker run --name=etcd --hostname=test001 --env=ETCDCTL_API=3 --env=ETCDCTL_CACERT=/etc/kubernetes/ssl/kube-ca.pem --env=ETCDCTL_CERT=/etc/kubernetes/ssl/kube-etcd-172-16-0-106.pem --env=ETCDCTL_KEY=/etc/kubernetes/ssl/kube-etcd-172-16-0-106-key.pem --env=ETCDCTL_ENDPOINTS=https://127.0.0.1:2379 --env=ETCD_UNSUPPORTED_ARCH=x86_64 --volume=/var/lib/etcd:/var/lib/rancher/etcd/:z --volume=/etc/kubernetes:/etc/kubernetes:z --network=host --restart=always --label='io.rancher.rke.container.name=etcd' --runtime=runc --detach=true registry.cn-hangzhou.aliyuncs.com/rancher/mirrored-coreos-etcd:v3.4.15-rancher1 /usr/local/bin/etcd --listen-peer-urls=https://0.0.0.0:2380 --trusted-ca-file=/etc/kubernetes/ssl/kube-ca.pem --peer-trusted-ca-file=/etc/kubernetes/ssl/kube-ca.pem --key-file=/etc/kubernetes/ssl/kube-etcd-172-16-0-106-key.pem --peer-cert-file=/etc/kubernetes/ssl/kube-etcd-172-16-0-106.pem --peer-key-file=/etc/kubernetes/ssl/kube-etcd-172-16-0-106-key.pem --peer-client-cert-auth=true --initial-advertise-peer-urls=https://172.16.0.106:2380 --heartbeat-interval=500 --cert-file=/etc/kubernetes/ssl/kube-etcd-172-16-0-106.pem --advertise-client-urls=https://172.16.0.106:2379 --cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 --initial-cluster=etcd-rke1-server-0=https://172.16.0.106:2380 --initial-cluster-state=new --client-cert-auth=true --listen-client-urls=https://0.0.0.0:2379 --initial-cluster-token=etcd-cluster-1 --name=etcd-rke1-server-0 --enable-v2=true --election-timeout=5000 --data-dir=/var/lib/rancher/etcd/ --force-new-cluster
|
启动完毕后检查 ETCD 集群状态
1 2
| docker exec -it -e ETCDCTL_API=3 etcd etcdctl member list -w table docker exec -it -e ETCDCTL_API=3 etcd etcdctl endpoint status --cluster -w table
|
在第一台 Control Plane 节点上添加 ETCD Member
1 2 3 4 5 6 7 8 9
| MEMBER_IP=172.16.0.105 MEMBER_NAME="rke1-server-1" docker exec -it etcd etcdctl member add etcd-$MEMBER_NAME --peer-urls=https://$MEMBER_IP:2380
ETCD_NAME="etcd-rke1-server-1" ETCD_INITIAL_CLUSTER="etcd-rke1-server-0=https://172.16.0.106:2380,etcd-rke1-server-1=https://172.16.0.105:2380" ETCD_INITIAL_ADVERTISE_PEER_URLS="https://172.16.0.105:2380" ETCD_INITIAL_CLUSTER_STATE="existing"
|
然后在第二台 Control Plane 节点,进行恢复
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
| mv /var/lib/etcd /var/lib/etcd_bak
NODE_IP=172.16.0.105 ETCD_IMAGE=registry.cn-hangzhou.aliyuncs.com/rancher/mirrored-coreos-etcd:v3.4.15-rancher1 ETCD_NAME="etcd-rke1-server-1" ETCD_INITIAL_CLUSTER="etcd-rke1-server-0=https://172.16.0.106:2380,etcd-rke1-server-1=https://172.16.0.105:2380" ETCD_INITIAL_ADVERTISE_PEER_URLS="https://172.16.0.105:2380" ETCD_INITIAL_CLUSTER_STATE="existing"
docker run --name=etcd --hostname=`hostname` \ --env="ETCDCTL_API=3" \ --env="ETCDCTL_CACERT=/etc/kubernetes/ssl/kube-ca.pem" \ --env="ETCDCTL_CERT=/etc/kubernetes/ssl/kube-etcd-`echo $NODE_IP|sed 's/\./-/g'`.pem" \ --env="ETCDCTL_KEY=/etc/kubernetes/ssl/kube-etcd-`echo $NODE_IP|sed 's/\./-/g'`-key.pem" \ --env="ETCDCTL_ENDPOINTS=https://127.0.0.1:2379" \ --env="ETCD_UNSUPPORTED_ARCH=x86_64" \ --env="PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" \ --volume="/var/lib/etcd:/var/lib/rancher/etcd/:z" \ --volume="/etc/kubernetes:/etc/kubernetes:z" \ --network=host \ --restart=always \ --label io.rancher.rke.container.name="etcd" \ --detach=true \ $ETCD_IMAGE \ /usr/local/bin/etcd \ --peer-client-cert-auth \ --client-cert-auth \ --peer-cert-file=/etc/kubernetes/ssl/kube-etcd-`echo $NODE_IP|sed 's/\./-/g'`.pem \ --peer-key-file=/etc/kubernetes/ssl/kube-etcd-`echo $NODE_IP|sed 's/\./-/g'`-key.pem \ --cert-file=/etc/kubernetes/ssl/kube-etcd-`echo $NODE_IP|sed 's/\./-/g'`.pem \ --trusted-ca-file=/etc/kubernetes/ssl/kube-ca.pem \ --initial-cluster-token=etcd-cluster-1 \ --peer-trusted-ca-file=/etc/kubernetes/ssl/kube-ca.pem \ --key-file=/etc/kubernetes/ssl/kube-etcd-`echo $NODE_IP|sed 's/\./-/g'`-key.pem \ --data-dir=/var/lib/rancher/etcd/ \ --advertise-client-urls=https://$NODE_IP:2379 \ --listen-client-urls=https://0.0.0.0:2379 \ --listen-peer-urls=https://0.0.0.0:2380 \ --initial-advertise-peer-urls=https://$NODE_IP:2380 \ --election-timeout=5000 \ --heartbeat-interval=500 \ --name=$ETCD_NAME \ --initial-cluster=$ETCD_INITIAL_CLUSTER \ --initial-cluster-state=$ETCD_INITIAL_CLUSTER_STATE
|
启动完后检查状态,如果没问题则可以重复上面步骤添加第三台节点
1 2
| docker exec -it -e ETCDCTL_API=3 etcd etcdctl member list -w table docker exec -it -e ETCDCTL_API=3 etcd etcdctl endpoint status --cluster -w table
|
集群状态正常后,恢复第一台 Control Plane 节点的 etcd
1 2 3 4
| docker stop etcd docker rename etcd etcd-restore docker rename etcd-old etcd docker start etcd
|