Docker 部署 Rancher 指定镜像仓库

docker 启动的 rancher 默认会走公网获取镜像,添加了 CATTLE_SYSTEM_DEFAULT_REGISTRY 的话,helm-operation 使用的 rancher/shell 等还是会走到公网,如果要所有镜像都是用 private registry,可以通过下面的方式。

准备 private registry 认证的配置文件和 k3s 配置文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
mkdir -p /etc/rancher/k3s

cat <<EOF > /etc/rancher/k3s/registries.yaml
configs:
"harbor.warnerchen.com":
auth:
username: xxx
password: xxx
tls:
insecure_skip_verify: true
EOF

cat <<EOF > /etc/rancher/k3s/config.yaml
system-default-registry: harbor.warnerchen.com
EOF

启动 rancher:

1
2
3
4
5
6
7
8
9
docker run -d --restart=unless-stopped --name rancher \
-v /var/lib/rancher:/var/lib/rancher \
-v /etc/rancher/k3s/registries.yaml:/etc/rancher/k3s/registries.yaml:ro \
-v /etc/rancher/k3s/config.yaml:/etc/rancher/k3s/config.yaml:ro \
-e CATTLE_BOOTSTRAP_PASSWORD=RancherForFun \
-e CATTLE_SYSTEM_DEFAULT_REGISTRY=harbor.warnerchen.com \
-p 80:80 -p 443:443 \
--privileged \
harbor.warnerchen.com/prime/rancher:v2.7.15-ent

RKE2 Calico 指定网卡

calico-node 的 IP_AUTODETECTION_METHOD 默认使用 first-found,如果节点存在多张网卡的时候,可能导致 calico-node 绑定到错误的网卡上,导致跨节点网络不通。

可以通过修改 IP_AUTODETECTION_METHOD 为 interface 或者 cidrs 来指定 calico-node 绑定到指定网卡上。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
cat <<EOF | kubectl apply -f -
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: rke2-calico
namespace: kube-system
spec:
valuesContent: |-
global:
cattle:
clusterId: "xxx"
installation:
calicoNetwork:
nodeAddressAutodetectionV4:
interface: "ens34"
# 也可以使用 cidrs,interface 和 cidrs 不能同时使用
cidrs:
- "172.16.16.0/24"
EOF

Rancher Elemental 使用随记

简介

Rancher Elemental 用于快速部署和管理基于容器的操作系统,如 SLE Micro 和 openSUSE MicroOS。它专为边缘计算和云原生环境设计,可以提供极简、易维护的操作系统。

组件:

  1. elemental:负责操作系统安装、更新和维护的命令行工具
  2. elemental-operator:运行于 kubernetes 中,用于管理设备注册和生命周期
  3. elemental-register:运行于设备中,用于将设备与 Rancher Elemental 集群注册
  4. elemental-system-agent:负责设备的配置应用和生命周期管理

CRD:

  1. MachineRegistration:定义设备如何注册到 Rancher Elemental 集群,并提供初始化配置
  2. MachineInventory:记录注册设备的详细信息,包括硬件属性和状态
  3. MachineInventorySelector:用于选择一组符合特定标签或条件的设备
  4. MachineInventorySelectorTemplate:用于生成 MachineInventorySelector,支持动态创建设备分组规则
  5. ManagedOSImage:描述和管理设备可用的操作系统镜像信息
  6. ManagedOSVersion:定义操作系统版本及其支持的功能和变更点
  7. ManagedOSVersionChannel:管理操作系统版本更新的分发渠道
  8. SeedImage:用于创建安装介质,将 Elemental 安装到节点上

使用随记

在 Rancher Extension 安装 Elemental

添加一个 OS Channel

1
2
3
4
5
6
7
8
9
10
11
apiVersion: elemental.cattle.io/v1beta1
kind: ManagedOSVersionChannel
metadata:
name: sl-micro-6.0-base-channel
namespace: fleet-default
spec:
deleteNoLongerInSyncVersions: false
options:
image: registry.suse.com/rancher/elemental-channel/sl-micro:6.0-base
syncInterval: 1h
type: custom

创建一个 MachineRegistration,Cloud Configuration 可以根据需求自定义,例如设置主机名、网络配置等

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
apiVersion: elemental.cattle.io/v1beta1
kind: MachineRegistration
metadata:
name: elemental-cluster-1
namespace: fleet-default
spec:
config:
cloud-config:
users:
- name: root
passwd: password
ssh-authorized-keys:
- >-
ssh-rsa xxx
write_files:
- content: |
BOOTPROTO='static'
IPADDR='172.16.16.141'
NETMASK='255.255.255.0'
GATEWAY='172.16.16.1'
DNS='172.16.16.12'
STARTMODE='auto'
path: /etc/sysconfig/network/ifcfg-eth0
permissions: '0600'
elemental:
install:
debug: true
device: /dev/sda
reboot: true
snapshotter:
type: loopdevice
reset:
reboot: true
reset-oem: true
reset-persistent: true
machineInventoryLabels:
author: warner
machineUUID: ${System Information/UUID}
manufacturer: ${System Information/Manufacturer}
productName: ${System Information/Product Name}
serialNumber: ${System Information/Serial Number}

创建完后,选择对应的 OS Version 构建镜像,点击构建后在 fleet-default 命名空间下会生成一个 pod,用于 base image 拉取、镜像构建、生成镜像下载地址

在界面下载构建好的 ISO,也可以通过 SeedImage CRD 获取下载地址

1
kubectl -n fleet-default get seedimages.elemental.cattle.io media-image-reg-xxx -ojsonpath={.status.downloadURL}

下载好后就可以通过这个 ISO 创建虚拟机,OS 安装过程中需要用到 TPM,所以需要在 vSphere 中开启本机类型的 TPM

vSphere 开启本机类型的 TPM 有两个前提:

  1. vSphere 需要配置域名,否则创建好后会无法进行 TPM 备份,无法备份的话就无法给虚拟机添加 TPM 设备
  2. 创建虚拟机所在的主机需要在一个集群中,否则添加 TPM 设备后会无法创建

以上条件具备后,即可创建虚拟机,引导需要选择 EFI 模式

开机后就会自动进行 OS 安装,并注册到 Rancher Elemental 集群中,可以在节点上通过命令查看注册状态

1
journalctl -f -u elemental-register-install.service

注册没问题的话,会生成一个 MachineInventory,记录设备的详细信息

接着就可以用这个节点创建集群

Docker 部署 NeuVector

Docker 部署 NeuVector 适用于做简单的测试。

部署 allinone 容器:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
docker run -d --name allinone \
--pid=host \
--privileged \
-e CLUSTER_JOIN_ADDR=172.16.0.1 \
-e NV_PLATFORM_INFO=platform=Docker \
-e CTRL_PERSIST_CONFIG=1 \
-p 18300:18300 \
-p 18301:18301 \
-p 18400:18400 \
-p 18401:18401 \
-p 10443:10443 \
-p 18301:18301/udp \
-p 8443:8443 \
-v /lib/modules:/lib/modules:ro \
-v /var/neuvector:/var/neuvector \
-v /var/run/docker.sock:/var/run/docker.sock:ro \
-v /sys/fs/cgroup:/host/cgroup:ro \
-v /proc:/host/proc:ro \
neuvector/allinone:5.4.0

部署 scanner 容器:

1
2
3
4
5
docker run -td --name scanner \
-e CLUSTER_JOIN_ADDR=172.16.0.1 \
-e NV_PLATFORM_INFO=platform=Docker \
-p 18402:18402 -v /var/run/docker.sock:/var/run/docker.sock:ro \
registry.cn-hangzhou.aliyuncs.com/rancher/mirrored-neuvector-scanner:latest

RKE2 Cilium without kube-proxy

集群如果使用 Cilium 作为 cni 的话,可以实现 Kubernetes Without kube-proxy。

Cilium 的 kube-proxy 替代程序依赖于 socket-LB 功能,需要使用 v4.19.57、v5.1.16、v5.2.0 或更高版本的 Linux 内核。Linux 内核 v5.3 和 v5.8 增加了其他功能,Cilium 可利用这些功能进一步优化 kube-proxy 替代实现。

已有的 RKE2 Cilium 集群,可以通过下面的步骤开启此功能。

在 Rancher 通过 Yaml 编辑集群:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
spec:
kubernetesVersion: v1.27.16+rke2r2
rkeConfig:
chartValues:
rke2-cilium:
# 如果有外部 LB 指向 kube-apiserver,可以设置为 VIP 地址
k8sServiceHost: 127.0.0.1
k8sServicePort: 6443
# 关键参数,开启 Cilium kube-proxy 替代功能
kubeProxyReplacement: true
machineGlobalConfig:
cni: cilium
# 关闭 kube-proxy
disable-kube-proxy: true

待 Provisioning 结束后,需要重启所有节点的 rke2-server or rke2-agent:

1
2
systemctl restart rke2-server
systemctl restart rke2-agent

这个时候 agent 节点就不会有 kube-proxy pod 了,但 server 节点的需要手动移除 kube-proxy Yaml 文件:

1
mv /var/lib/rancher/rke2/agent/pod-manifests/kube-proxy.yaml ~/kube-proxy.yaml

这个时候集群已经没有 kube-proxy pod,然后清除之前生成的 iptables 规则:

1
iptables -F && iptables -X && iptables -Z && iptables -F -t nat && iptables -X -t nat && iptables -Z -t nat

最后,重启 Cilium:

1
kubectl -n kube-system rollout restart ds cilium

检查 KubeProxyReplacement 配置是否生效:

1
2
3
root@test001:~# kubectl -n kube-system exec ds/cilium -c cilium-agent -- cilium-dbg status | grep KubeProxyReplacement
KubeProxyReplacement: True [eth0 172.16.0.2 fe80::216:3eff:fe08:5140 (Direct Routing)]
root@test001:~#

获取更多细节配置:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
root@test001:~# kubectl -n kube-system exec ds/cilium -c cilium-agent -- cilium-dbg status --verbose
...
KubeProxyReplacement Details:
Status: True
Socket LB: Enabled
Socket LB Tracing: Enabled
Socket LB Coverage: Full
Devices: eth0 172.16.0.2 fe80::216:3eff:fe08:5140 (Direct Routing)
Mode: SNAT
Backend Selection: Random
Session Affinity: Enabled
Graceful Termination: Enabled
NAT46/64 Support: Disabled
XDP Acceleration: Disabled
Services:
- ClusterIP: Enabled
- NodePort: Enabled (Range: 30000-32767)
- LoadBalancer: Enabled
- externalIPs: Enabled
- HostPort: Enabled
...

查看是否还有 kube-proxy 的 iptables 规则:

1
iptables-save | grep KUBE-SVC

创建一个 Workload 和 Service 进行 ClusterIP/NodePort 测试,能正常通信即可:

1
2
3
4
5
6
7
8
9
10
11
12
13
root@test001:~# kubectl get pod -n kube-system | grep kube-proxy

root@test001:~# kubectl get svc nginx
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
nginx NodePort 10.43.38.171 <none> 80:32048/TCP 26h

root@test001:~# curl 10.43.38.171 -I
HTTP/1.1 200 OK
...

root@test001:~# curl 127.0.0.1:32048 -I
HTTP/1.1 200 OK
...

RKE1 ETCD 出现 request cluster id mismatch 问题修复记录

当集群中三个 Control Plane 节点的 ETCD 出现 request cluster ID mismatch 问题时,可以保留一个 ETCD 实例通过 --force-new-cluster 参数重建集群,然后再将其他两个节点的 ETCD 实例加入集群。

通过 docker rename 的方式保留第二/三台 Control Plane 节点的 ETCD

1
2
docker stop etcd
docker rename etcd etcd-old

备份第一台 Control Plane 节点的 ETCD 启动命令

1
2
3
4
docker run --rm -v /var/run/docker.sock:/var/run/docker.sock assaflavie/runlike:latest etcd

# 以下为 ETCD 启动命令
docker run --name=etcd --hostname=test001 --env=ETCDCTL_API=3 --env=ETCDCTL_CACERT=/etc/kubernetes/ssl/kube-ca.pem --env=ETCDCTL_CERT=/etc/kubernetes/ssl/kube-etcd-172-16-0-106.pem --env=ETCDCTL_KEY=/etc/kubernetes/ssl/kube-etcd-172-16-0-106-key.pem --env=ETCDCTL_ENDPOINTS=https://127.0.0.1:2379 --env=ETCD_UNSUPPORTED_ARCH=x86_64 --volume=/var/lib/etcd:/var/lib/rancher/etcd/:z --volume=/etc/kubernetes:/etc/kubernetes:z --network=host --restart=always --label='io.rancher.rke.container.name=etcd' --runtime=runc --detach=true registry.cn-hangzhou.aliyuncs.com/rancher/mirrored-coreos-etcd:v3.4.15-rancher1 /usr/local/bin/etcd --listen-peer-urls=https://0.0.0.0:2380 --trusted-ca-file=/etc/kubernetes/ssl/kube-ca.pem --peer-trusted-ca-file=/etc/kubernetes/ssl/kube-ca.pem --key-file=/etc/kubernetes/ssl/kube-etcd-172-16-0-106-key.pem --peer-cert-file=/etc/kubernetes/ssl/kube-etcd-172-16-0-106.pem --peer-key-file=/etc/kubernetes/ssl/kube-etcd-172-16-0-106-key.pem --peer-client-cert-auth=true --initial-advertise-peer-urls=https://172.16.0.106:2380 --heartbeat-interval=500 --cert-file=/etc/kubernetes/ssl/kube-etcd-172-16-0-106.pem --advertise-client-urls=https://172.16.0.106:2379 --cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 --initial-cluster=etcd-rke1-server-0=https://172.16.0.106:2380,etcd-rke1-server-1=https://172.16.0.105:2380,etcd-rke1-server-2=https://172.16.0.104:2380 --initial-cluster-state=new --client-cert-auth=true --listen-client-urls=https://0.0.0.0:2379 --initial-cluster-token=etcd-cluster-1 --name=etcd-rke1-server-0 --enable-v2=true --election-timeout=5000 --data-dir=/var/lib/rancher/etcd/

停止第一台 Control Plane 节点的 ETCD

1
2
docker stop etcd
docker rename etcd etcd-old

修改先前保存的 ETCD 启动命令,在 initial-cluster 参数中删除第二/三台 Control Plane 节点的 ETCD 信息,并在最后添加 --force-new-cluster 参数,然后执行,如果启动后仍然报 request cluster ID mismatch 的错误,可以重复多几次

1
docker run --name=etcd --hostname=test001 --env=ETCDCTL_API=3 --env=ETCDCTL_CACERT=/etc/kubernetes/ssl/kube-ca.pem --env=ETCDCTL_CERT=/etc/kubernetes/ssl/kube-etcd-172-16-0-106.pem --env=ETCDCTL_KEY=/etc/kubernetes/ssl/kube-etcd-172-16-0-106-key.pem --env=ETCDCTL_ENDPOINTS=https://127.0.0.1:2379 --env=ETCD_UNSUPPORTED_ARCH=x86_64 --volume=/var/lib/etcd:/var/lib/rancher/etcd/:z --volume=/etc/kubernetes:/etc/kubernetes:z --network=host --restart=always --label='io.rancher.rke.container.name=etcd' --runtime=runc --detach=true registry.cn-hangzhou.aliyuncs.com/rancher/mirrored-coreos-etcd:v3.4.15-rancher1 /usr/local/bin/etcd --listen-peer-urls=https://0.0.0.0:2380 --trusted-ca-file=/etc/kubernetes/ssl/kube-ca.pem --peer-trusted-ca-file=/etc/kubernetes/ssl/kube-ca.pem --key-file=/etc/kubernetes/ssl/kube-etcd-172-16-0-106-key.pem --peer-cert-file=/etc/kubernetes/ssl/kube-etcd-172-16-0-106.pem --peer-key-file=/etc/kubernetes/ssl/kube-etcd-172-16-0-106-key.pem --peer-client-cert-auth=true --initial-advertise-peer-urls=https://172.16.0.106:2380 --heartbeat-interval=500 --cert-file=/etc/kubernetes/ssl/kube-etcd-172-16-0-106.pem --advertise-client-urls=https://172.16.0.106:2379 --cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 --initial-cluster=etcd-rke1-server-0=https://172.16.0.106:2380 --initial-cluster-state=new --client-cert-auth=true --listen-client-urls=https://0.0.0.0:2379 --initial-cluster-token=etcd-cluster-1 --name=etcd-rke1-server-0 --enable-v2=true --election-timeout=5000 --data-dir=/var/lib/rancher/etcd/ --force-new-cluster

启动完毕后检查 ETCD 集群状态

1
2
docker exec -it -e ETCDCTL_API=3 etcd etcdctl member list -w table
docker exec -it -e ETCDCTL_API=3 etcd etcdctl endpoint status --cluster -w table

在第一台 Control Plane 节点上添加 ETCD Member

1
2
3
4
5
6
7
8
9
MEMBER_IP=172.16.0.105
MEMBER_NAME="rke1-server-1"
docker exec -it etcd etcdctl member add etcd-$MEMBER_NAME --peer-urls=https://$MEMBER_IP:2380

# 执行完命令后,下面的配置需要保留,后续节点启动 ETCD 时需要使用
ETCD_NAME="etcd-rke1-server-1"
ETCD_INITIAL_CLUSTER="etcd-rke1-server-0=https://172.16.0.106:2380,etcd-rke1-server-1=https://172.16.0.105:2380"
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://172.16.0.105:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"

然后在第二台 Control Plane 节点,进行恢复

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# 备份数据
mv /var/lib/etcd /var/lib/etcd_bak

# 设置变量
NODE_IP=172.16.0.105
ETCD_IMAGE=registry.cn-hangzhou.aliyuncs.com/rancher/mirrored-coreos-etcd:v3.4.15-rancher1
ETCD_NAME="etcd-rke1-server-1"
ETCD_INITIAL_CLUSTER="etcd-rke1-server-0=https://172.16.0.106:2380,etcd-rke1-server-1=https://172.16.0.105:2380"
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://172.16.0.105:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"

# 启动 ETCD
docker run --name=etcd --hostname=`hostname` \
--env="ETCDCTL_API=3" \
--env="ETCDCTL_CACERT=/etc/kubernetes/ssl/kube-ca.pem" \
--env="ETCDCTL_CERT=/etc/kubernetes/ssl/kube-etcd-`echo $NODE_IP|sed 's/\./-/g'`.pem" \
--env="ETCDCTL_KEY=/etc/kubernetes/ssl/kube-etcd-`echo $NODE_IP|sed 's/\./-/g'`-key.pem" \
--env="ETCDCTL_ENDPOINTS=https://127.0.0.1:2379" \
--env="ETCD_UNSUPPORTED_ARCH=x86_64" \
--env="PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" \
--volume="/var/lib/etcd:/var/lib/rancher/etcd/:z" \
--volume="/etc/kubernetes:/etc/kubernetes:z" \
--network=host \
--restart=always \
--label io.rancher.rke.container.name="etcd" \
--detach=true \
$ETCD_IMAGE \
/usr/local/bin/etcd \
--peer-client-cert-auth \
--client-cert-auth \
--peer-cert-file=/etc/kubernetes/ssl/kube-etcd-`echo $NODE_IP|sed 's/\./-/g'`.pem \
--peer-key-file=/etc/kubernetes/ssl/kube-etcd-`echo $NODE_IP|sed 's/\./-/g'`-key.pem \
--cert-file=/etc/kubernetes/ssl/kube-etcd-`echo $NODE_IP|sed 's/\./-/g'`.pem \
--trusted-ca-file=/etc/kubernetes/ssl/kube-ca.pem \
--initial-cluster-token=etcd-cluster-1 \
--peer-trusted-ca-file=/etc/kubernetes/ssl/kube-ca.pem \
--key-file=/etc/kubernetes/ssl/kube-etcd-`echo $NODE_IP|sed 's/\./-/g'`-key.pem \
--data-dir=/var/lib/rancher/etcd/ \
--advertise-client-urls=https://$NODE_IP:2379 \
--listen-client-urls=https://0.0.0.0:2379 \
--listen-peer-urls=https://0.0.0.0:2380 \
--initial-advertise-peer-urls=https://$NODE_IP:2380 \
--election-timeout=5000 \
--heartbeat-interval=500 \
--name=$ETCD_NAME \
--initial-cluster=$ETCD_INITIAL_CLUSTER \
--initial-cluster-state=$ETCD_INITIAL_CLUSTER_STATE

启动完后检查状态,如果没问题则可以重复上面步骤添加第三台节点

1
2
docker exec -it -e ETCDCTL_API=3 etcd etcdctl member list -w table
docker exec -it -e ETCDCTL_API=3 etcd etcdctl endpoint status --cluster -w table

集群状态正常后,恢复第一台 Control Plane 节点的 etcd

1
2
3
4
docker stop etcd
docker rename etcd etcd-restore
docker rename etcd-old etcd
docker start etcd

NeuVector 的 Zero Draft 与 Basic 模式

NeuVector 有 Zero-drift 和 Basic 两种模式,而 Zero-drift 模式是默认模式,根据一个 Nginx 来作为测试案例,观察两种模式下 Process Profile Rules 的效果。

Zero-drift 模式

Discover

在 Discover 下,NV 会自动学习容器运行中的进程并生成 Process Profile Rules

image-1

若发现未授权进程,会触发告警

image-2

但不自动生成 File Access Rules,只有在系统默认监控的目录内进行文件增删改查等动作才会触发告警

image-3

Monitor

Monitor 与 Discover 类似,任何不符合 Process Profile Rules 的活动都会发出警告,但不会阻止操作

Protect

在 Protect 下,对未被授权的进程和文件活动进行强制阻止并发出告警

1
2
3
root@test:~# kubectl exec -it nginx-57b989859-bbh9j -- bash
exec /usr/bin/bash: operation not permitted
command terminated with exit code 1

Basic 模式

Discover

与 Zero-drift 类似,会学习容器进程,但不会对容器中的新进程发出告警,反而会自动学习这些新进程,也就是说在该模式下所有的进程活动都是被允许的;文件监控的规则与 Zero-drift 相同不会自动生成

image-4

Monitor

不会学习新进程,任何未被允许的进程活动都会触发告警

image-5

Protect

行为与 Zero-drift Protect 相同,禁止任何未被允许的进程活动,并触发告警

结论

NeuVector 的 Zero-drift 模式和 Basic 模式主要区别在于 Discover 模式下的行为。Zero-drift 更为严格,确保容器仅运行镜像中定义的进程,不允许任何新的进程运行。而 Basic 模式则更灵活,允许 NeuVector 学习容器内的新进程,并根据这些新活动自动生成规则。两种模式在 Monitor 和 Protect 模式下都会对容器的进程和文件活动进行监控和防护。

Zero-drift 模式

  1. Discover:Zero-drift 会自动分析和学习容器镜像中允许的进程,并自动生成 Process Profile Rules,确保容器只运行在镜像内定义的进程。如果容器内有其他非镜像定义的进程启动,NeuVector 会发出告警。文件访问监控则只针对 NV 默认的监控目录,且不会自动生成 File Access Rules。这种模式更严格地控制容器行为,尤其适用于需要高安全性的场景。

  2. Monitor:和 Discover 相似,NeuVector 会持续监控容器内的进程和文件活动,但不会阻止活动,只会发出警告。如果有任何与规则不匹配的进程或文件操作,都会触发告警。

  3. Protect:在 Protect 下,Zero-drift 会阻止任何与规则不匹配的进程或文件操作,确保容器运行环境的安全。如果发现未授权的进程或文件活动,NeuVector 会立即阻止并发出告警。

Basic 模式

  1. Discover:Basic 模式也会自动学习容器运行的进程,并生成 Process Profile Rules,但与 Zero-drift 模式不同,Basic 模式不会限制新进程的运行,即便有非镜像定义的进程启动,NeuVector 也不会立即发出告警,反而会自动学习这些新进程,创建相应的规则。文件监控的规则则与 Zero-drift 模式相同,依赖于系统默认的监控目录。该模式更灵活,适用于需要动态调整容器进程的场景。

  2. Monitor:在 Monitor 下,Basic 模式不再继续学习新的进程活动,任何未经授权的进程操作都会触发告警。

  3. Protect:在 Protect 下,Basic 模式和 Zero-drift 模式的行为一致,NeuVector 会阻止任何未被授权的进程和文件活动,并发出告警。

RKE1 部署随记

部署 RKE1

前期准备

1
2
3
4
5
6
7
# RKE1 二进制
curl -LO "https://github.com/rancher/rke/releases/download/v1.5.12/rke_linux-amd64"

mv rke_linux-amd64 /usr/local/bin/rke && chmod +x /usr/local/bin/rke

# 各节点安装 Docker
curl https://releases.rancher.com/install-docker/20.10.sh | sh

生成配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
cat <<EOF > cluster.yml
# 旧版本 rke1 私钥类型不支持 rsa,需要选择 ed25519
ssh_key_path: /root/.ssh/id_ed25519
nodes:
- address: 172.16.0.106
hostname_override: rke1-server-0
internal_address: 172.16.0.106
user: root
role:
- controlplane
- etcd
- worker
- address: 172.16.0.105
hostname_override: rke1-server-1
internal_address: 172.16.0.105
user: root
role:
- controlplane
- etcd
- worker
- address: 172.16.0.104
hostname_override: rke1-server-2
internal_address: 172.16.0.104
user: root
role:
- controlplane
- etcd
- worker
private_registries:
- url: registry.cn-hangzhou.aliyuncs.com
is_default: true
kubernetes_version: "v1.20.15-rancher2-2"
network:
plugin: calico
EOF

安装 RKE1

1
rke up --config cluster.yml

方便后续运维配置

1
2
3
4
5
6
7
8
9
10
11
mkdir ~/.kube

mv kube_config_cluster.yml ~/.kube/config

find / -name kubectl | grep "/usr/local" | head -1 | awk '{ print "cp "$1" /usr/local/bin" }' | sh

echo "source <(kubectl completion bash)" >> ~/.bashrc

curl https://rancher-mirror.rancher.cn/helm/get-helm-3.sh | INSTALL_HELM_MIRROR=cn bash -s -- --version v3.10.3

echo "source <(helm completion bash)" >> ~/.bashrc

常见问题

如果是 CentOS 和 RHEL 系统,默认不允许使用 root 用户进行安装,报错信息如下:

1
WARN[0000] Failed to set up SSH tunneling for host [x.x.x.x]: Can’t retrieve Docker Info ,Failed to dial to /var/run/docker.sock: ssh: rejected: administratively prohibited (open failed)

需要准备其他用户:

1
groupadd rancher && useradd rancher -g rancher && usermod -aG docker rancher

如果出现下面错误,是由于指定的 ssh_key_path 文件对应的主机不正确或对应的用户名不正确,可以检查下节点对应用户的 ~/.ssh/authorized_keys 文件是否正确:

1
WARN[0000] Failed to set up SSH tunneling for host [x.x.x.x]: Can't retrieve Docker Info: error during connect: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/info": Unable to access node with address [x.x.x.x:22] using SSH. Please check if you are able to SSH to the node using the specified SSH Private Key and if you have configured the correct SSH username. Error: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain

如果出现下面错误:

1
WARN[0000] Failed to set up SSH tunneling for host [x.x.x.x]: Can't retrieve Docker Info: error during connect: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.24/info: Unable to access the service on /var/run/docker.sock. The service might be still starting up. Error: ssh: rejected: connect failed (open failed) 

需要在 /etc/ssh/sshd_config 文件中添加以下内容:

1
AllowTcpForwarding yes

清理 iptables 规则

1
2
3
4
5
6
7
iptables -F \
&& iptables -X \
&& iptables -Z \
&& iptables -F -t nat \
&& iptables -X -t nat \
&& iptables -Z -t nat \
&& docker restart kube-proxy

清理节点

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
sudo docker rm -f $(sudo docker ps -qa)
sudo docker rmi -f $(sudo docker images -q)
sudo docker volume rm $(sudo docker volume ls -q)

for mount in $(sudo mount | grep tmpfs | grep '/var/lib/kubelet' | awk '{ print $3 }') /var/lib/kubelet /var/lib/rancher; do sudo umount $mount; done

sudo rm -rf /etc/ceph \
/etc/cni \
/etc/kubernetes \
/etc/rancher \
/opt/cni \
/opt/rke \
/run/secrets/kubernetes.io \
/run/calico \
/run/flannel \
/var/lib/calico \
/var/lib/etcd \
/var/lib/cni \
/var/lib/kubelet \
/var/lib/rancher\
/var/log/containers \
/var/log/kube-audit \
/var/log/pods \
/var/run/calico

sudo reboot

调用 NeuVector API 进行镜像扫描

开启 REST API

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Service
metadata:
name: neuvector-service-controller
namespace: cattle-neuvector-system
spec:
ports:
- port: 10443
name: controller
protocol: TCP
type: NodePort
selector:
app: neuvector-controller-pod
EOF

准备一些调用接口所需的环境变量

1
2
3
4
5
6
7
8
9
nv_service_ip="neuvector-service-controller"
nv_service_port="10443"
nv_service_login_user="admin"
nv_service_login_password="admin"
image_registry_url="https://xxx"
image_registry_user="xxx"
image_registry_password="xxx"
image_repo="library/nginx"
image_tag="mainline"

调用接口进行镜像扫描

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# NV 认证 API
api_login_url="https://$nv_service_ip:$nv_service_port/v1/auth"
echo $api_login_url

# 定义 NV 认证参数
login_json="{\"password\":{\"username\":\"$nv_service_login_user\",\"password\":\"$nv_service_login_password\"}}"
echo $login_json

# 获取 NV 认证 token
nv_token=`(curl -s -f $api_login_url -k -H "Content-Type:application/json" -d $login_json || echo null) | jq -r '.token.token'`
echo $nv_token

# 镜像扫描 API
api_scan_repo_url="https://$nv_service_ip:$nv_service_port/v1/scan/repository"
echo $api_scan_repo_url

# 定义镜像扫描参数
nv_scanned_json="{\"request\": {\"registry\": \"$image_registry_url\", \"username\": \"$image_registry_user\", \"password\": \"$image_registry_password\", \"repository\": \"$image_repo\", \"tag\": \"$image_tag\"}}"
echo $nv_scanned_json

# 调用镜像扫描 API
curl -k "$api_scan_repo_url" -H "Content-Type: application/json" -H "X-Auth-Token: $nv_token" -d "$nv_scanned_json"

registry 为空的时候,NeuVector 会对本地镜像进行扫描,但只支持在 allinone 下使用,如果是在 K8s 部署的 NV 中调用接口进行本地扫描,会出现报错:

1
2
2024-11-06T09:14:15.179|INFO|CTL|rest.(*repoScanTask).Run: Scan repository start - image=library/nginx:mainline registry=
2024-11-06T09:14:15.24 |ERRO|CTL|rest.(*repoScanTask).Run: Failed to scan repository - error=container API call error image=library/nginx:mainline registry=

NeuVector 除了调用 API 接口进行镜像扫描外,还可以使用 Assets -> Registries 对接镜像仓库进行扫描,如果存在 Image scanned = false 的 Admission Control,只要完成两种扫描方式的其中一种,就可以顺利完成部署而不被规则所拦截。

SUSE 第一周产品部署使用随记

安装单点的 RKE2

该 RKE2 作为 local,用于部署 rancher

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
# 配置目录和文件准备
mkdir -pv /etc/rancher/rke2

cat > /etc/rancher/rke2/config.yaml <<EOF
token: my-shared-secret
tls-san:
- 172.16.170.200
system-default-registry: registry.cn-hangzhou.aliyuncs.com
debug: true
EOF

# 安装 rke2-server 等二进制
curl -sfL https://rancher-mirror.rancher.cn/rke2/install.sh | INSTALL_RKE2_MIRROR=cn sh -

# 启动第一台 server 节点
systemctl enable rke2-server --now

# 方便后续运维的配置
mkdir -pv ~/.kube

ln -s /etc/rancher/rke2/rke2.yaml ~/.kube/config

curl https://rancher-mirror.rancher.cn/helm/get-helm-3.sh | INSTALL_HELM_MIRROR=cn bash -s -- --version v3.10.3

echo "export CONTAINER_RUNTIME_ENDPOINT=\"unix:///run/k3s/containerd/containerd.sock\"" >> ~/.bashrc

echo "export PATH=$PATH:/var/lib/rancher/rke2/bin" >> ~/.bashrc

echo "source <(kubectl completion bash)" >> ~/.bashrc

echo "source <(helm completion bash)" >> ~/.bashrc

export NERDCTL_VERSION=1.7.6

wget "https://files.m.daocloud.io/github.com/containerd/nerdctl/releases/download/v$NERDCTL_VERSION/nerdctl-$NERDCTL_VERSION-linux-amd64.tar.gz"

tar Czvxf /usr/local/bin nerdctl-$NERDCTL_VERSION-linux-amd64.tar.gz && rm -rf nerdctl-$NERDCTL_VERSION-linux-amd64.tar.gz

echo "export CONTAINERD_ADDRESS=\"/run/k3s/containerd/containerd.sock\"" >> ~/.bashrc

echo "export CONTAINERD_NAMESPACE=\"k8s.io\"" >> ~/.bashrc

安装 Rancher

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# 通过 helm chart 方式安装
helm repo add rancher-stable https://releases.rancher.com/server-charts/stable

# 安装 cert-manager
helm repo add jetstack https://charts.jetstack.io

helm repo update

helm install \
cert-manager jetstack/cert-manager \
--namespace cert-manager \
--create-namespace \
--version v1.15.3 \
--set crds.enabled=true

# 安装 rancher
helm install rancher rancher-stable/rancher \
--namespace cattle-system \
--create-namespace \
--set hostname=rancher.warnerchen.io \
--set replicas=1 \
--set bootstrapPassword=admin \
--set rancherImage=registry.cn-hangzhou.aliyuncs.com/rancher/rancher \
--set systemDefaultRegistry=registry.cn-hangzhou.aliyuncs.com

RKE2 默认会安装 Nginx Ingress Controller,监听节点的 80/443 端口,而安装 Rancher 的时候设置好 hostname 的话会创建一个 Ingress,所以可以通过该 Ingress 进行访问

rancher-login

创建集群

在 UI 创建集群后,会提供注册命令,在节点上执行该命令进行注册

创建集群一

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
root@rke2-test-controller-0:~# curl --insecure -fL https://rancher.warnerchen.io/system-agent-install.sh | sudo  sh -s - --server https://rancher.warnerchen.io --label 'cattle.io/os=linux' --token xxx --ca-checksum xxx --etcd --controlplane --worker --node-name rke2-test-controller-0
[INFO] Label: cattle.io/os=linux
[INFO] Role requested: etcd
[INFO] Role requested: controlplane
[INFO] Role requested: worker
[INFO] Using default agent configuration directory /etc/rancher/agent
[INFO] Using default agent var directory /var/lib/rancher/agent
[INFO] Determined CA is necessary to connect to Rancher
[INFO] Successfully downloaded CA certificate
[INFO] Value from https://rancher.warnerchen.io/cacerts is an x509 certificate
[INFO] Successfully tested Rancher connection
[INFO] Downloading rancher-system-agent binary from https://rancher.warnerchen.io/assets/rancher-system-agent-amd64
[INFO] Successfully downloaded the rancher-system-agent binary.
[INFO] Downloading rancher-system-agent-uninstall.sh script from https://rancher.warnerchen.io/assets/system-agent-uninstall.sh
[INFO] Successfully downloaded the rancher-system-agent-uninstall.sh script.
[INFO] Generating Cattle ID
[INFO] Successfully downloaded Rancher connection information
[INFO] systemd: Creating service file
[INFO] Creating environment file /etc/systemd/system/rancher-system-agent.env
[INFO] Enabling rancher-system-agent.service
Created symlink /etc/systemd/system/multi-user.target.wants/rancher-system-agent.service → /etc/systemd/system/rancher-system-agent.service.
[INFO] Starting/restarting rancher-system-agent.service
root@rke2-test-controller-0:~#

注册后发现 cattle-cluster-agent 一直在崩溃重启

1
2
3
4
5
6
root@rke2-test-controller-0:~# kubectl -n cattle-system get pod
NAME READY STATUS RESTARTS AGE
cattle-cluster-agent-767b67b66f-bcl2s 0/1 CrashLoopBackOff 5 (79s ago) 10m
root@rke2-test-controller-0:~# kubectl -n cattle-system logs cattle-cluster-agent-767b67b66f-bcl2s -p
...
ERROR: https://rancher.warnerchen.io/ping is not accessible (Could not resolve host: rancher.warnerchen.io)

这是由于该域名没有 DNS 去做解析,可以通过 CoreDNS 实现暂时的映射,然后重启即可

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus 0.0.0.0:9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
hosts {
172.16.170.200 rancher.warnerchen.io
fallthrough
}
}

集群 Ready

创建集群二

Monitoring

通过 UI 选择 monitoring helm chart 即可完成安装,会有一些基本的组件(e.g. prometheus/alertmanager…)

grafana

WebHook 配置

告警可以对接多种形式,WebHook 则是通过 AlertmanagerConfig 的 CR 完成配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
cat <<EOF | kubectl apply -f -
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
name: test-webhook
namespace: default
spec:
receivers:
- name: test-webhook
webhookConfigs:
- httpConfig:
tlsConfig: {}
sendResolved: false
url: https://webhook.site/xxx
route:
groupBy: []
groupInterval: 5m
groupWait: 30s
matchers: []
repeatInterval: 4h
EOF

Logging

通过 UI 选择 logging helm chart 即可完成安装,主要是使用了 Logging Operator 配置日志流水线

Logging Operator 会部署一个 FluentBit DaemonSet 用于收集日志,然后将数据传输到 Fluentd,再由 Fluentd 传到不同的 output

主要的 CR 有:

  1. Flow: 是一个命名空间自定义资源,它使用过滤器和选择器将日志消息路由到对应的 Output 或者 ClusterOutput
  2. ClusterFlow: 用于路由集群级别的日志消息
  3. Output: 用于路由命名空间级别的日志消息
  4. ClusterOutput: Flow 和 ClusterFlow 都可与其对接

部署 ES 和 Kibana

基于 ECK Operator 的能力,部署 ES 和 Kibana,后续可通过配置 OutPut 输出到 ES 中

1
2
3
# Install ECK Operator
kubectl create -f https://download.elastic.co/downloads/eck/2.14.0/crds.yaml
kubectl apply -f https://download.elastic.co/downloads/eck/2.14.0/operator.yaml

安装 ES 和 Kibana,存储暂时用本地存储吧 - -

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
cat <<EOF | kubectl apply -f -
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: logging
namespace: cattle-logging-system
spec:
version: 7.15.2
nodeSets:
- name: logging
count: 1
config:
node.store.allow_mmap: false
EOF

cat <<EOF | kubectl apply -f -
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
name: logging
namespace: cattle-logging-system
spec:
version: 7.15.2
count: 1
elasticsearchRef:
name: logging
EOF

cat <<EOF | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
nginx.ingress.kubernetes.io/backend-protocol: HTTPS
name: logging-kb
namespace: cattle-logging-system
spec:
ingressClassName: nginx
rules:
- host: kibana.warnerchen.io
http:
paths:
- backend:
service:
name: logging-kb-http
port:
number: 5601
path: /
pathType: Prefix

创建 Flow 和 Output

需要先创建 Output

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
cat <<EOF | kubectl apply -f -
apiVersion: v1
data:
elastic: xxx
kind: Secret
metadata:
name: logging-es-elastic-user
namespace: default
type: Opaque

---
apiVersion: logging.banzaicloud.io/v1beta1
kind: Output
metadata:
name: output-to-es
namespace: default
spec:
elasticsearch:
host: logging-es-http.cattle-logging-system.svc.cluster.local
index_name: ns-default
password:
valueFrom:
secretKeyRef:
key: elastic
name: logging-es-elastic-user
port: 9200
scheme: https
ssl_verify: false
ssl_version: TLSv1_2
suppress_type_name: false
user: elastic
EOF

创建 Flow,收集标签为 app=nginx 的 Pod 日志

1
2
3
4
5
6
7
8
9
10
11
12
13
14
cat <<EOF | kubectl apply -f -
apiVersion: logging.banzaicloud.io/v1beta1
kind: Flow
metadata:
name: flow-for-default
namespace: default
spec:
localOutputRefs:
- output-to-es
match:
- select:
labels:
app: nginx
EOF

查看是否有对应的索引

logging一

创建 Pattern 查看日志

logging二

NeuVector

通过 UI 选择 NeuVector helm chart 即可完成安装

LongHorn

安装 LongHorn 之前,需要在所有节点上安装依赖

1
2
3
apt update
apt -y install open-iscsi nfs-common
systemctl enable iscsid --now

然后通过 UI 安装 LongHorn

数据卷的快照和恢复

LongHorn 支持 SnapShot,可以直接创建快照和恢复

在 UI 中,创建一个快照

数据卷的快照和恢复一

然后删除数据文件

1
kubectl exec -it nginx-7f6d5dcf8c-tvxcw -- rm -rf /data/test.txt

停止服务

1
kubectl scale deployment nginx --replicas=0

通过维护模式重新 Attach

数据卷的快照和恢复二

挂载后进入该 Volume,然后选择快照进行恢复

数据卷的快照和恢复三

恢复后,Detach 该 Volume,启动服务后,即可看到数据的恢复

数据卷的快照和恢复四

数据卷的备份和灾难恢复

测试通过 LongHorn 能力实现跨集群的数据备份和恢复,可备份至集群外的 S3 or NFS

MinIO 部署,使用了 operator 的能力

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
cat <<EOF | kubectl apply -f -
apiVersion: v1
data:
accesskey: bWluaW8=
secretkey: VGpCcFkwVTNZVGcyU3c9PQ==
kind: Secret
metadata:
name: backup-minio-secret
namespace: default
type: Opaque

---
apiVersion: v1
data:
config.env: ZXhwb3J0IE1JTklPX0JST1dTRVI9Im9uIgpleHBvcnQgTUlOSU9fUk9PVF9VU0VSPSJtaW5pbyIKZXhwb3J0IE1JTklPX1JPT1RfUEFTU1dPUkQ9IlRqQnBZMFUzWVRnMlN3PT0iCg==
kind: Secret
metadata:
name: backup-minio-env-configuration
namespace: default
type: Opaque

---
apiVersion: minio.min.io/v2
kind: Tenant
metadata:
name: backup-minio
namespace: default
spec:
buckets:
- name: longhorn
configuration:
name: backup-minio-env-configuration
# credsSecret:
# name: backup-minio-secret
env:
- name: MINIO_PROMETHEUS_AUTH_TYPE
value: public
- name: MINIO_SERVER_URL
value: http://minio-hl.warnerchen.io
image: quay.m.daocloud.io/minio/minio:RELEASE.2023-10-07T15-07-38Z
initContainers:
- command:
- sh
- -c
- chown -R 1000:1000 /export/* || true
image: quay.m.daocloud.io/minio/minio:RELEASE.2023-10-07T15-07-38Z
name: change-permission
securityContext:
capabilities:
add:
- CHOWN
volumeMounts:
- mountPath: /export
name: "0"
pools:
- name: pool-0
resources:
limits:
cpu: 500m
memory: 500Mi
requests:
cpu: 50m
memory: 100Mi
servers: 1
volumeClaimTemplate:
metadata: {}
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
volumesPerServer: 1
requestAutoCert: false
serviceMetadata:
minioServiceLabels:
mcamel/exporter-type: minio

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: minio
namespace: default
spec:
rules:
- host: minio.warnerchen.io
http:
paths:
- backend:
service:
name: minio
port:
number: 443
path: /
pathType: Prefix

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: minio-hl
namespace: default
spec:
rules:
- host: minio-hl.warnerchen.io
http:
paths:
- backend:
service:
name: backup-minio-hl
port:
number: 9000
path: /
pathType: Prefix
EOF

准备一个 Bucket

数据备份与恢复二

在两个集群的 longhorn-system 下创建 Secret,主要有这几个内容

  1. AWS_ACCESS_KEY_ID: Access Key
  2. AWS_SECRET_ACCESS_KEY: Secret Key
  3. AWS_ENDPOINTS: S3 URL
  4. AWS_CERT: 如果使用了自签证书则需要配置

数据备份与恢复三

创建好 Secret 后,需要在 LongHorn UI 配置 Backup Target

数据备份与恢复四

在任意集群做操作,创建一个 PVC

1
2
3
4
5
6
7
8
9
10
11
12
13
14
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nginx-pvc
namespace: default
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: longhorn
EOF

给 Nginx 挂载后,随意写入一些数据

数据备份与恢复一

在该集群的 LongHorn 创建一个备份

数据备份与恢复五

数据备份与恢复六

在 MinIO 就可以看到这个备份

数据备份与恢复七

至此,两个集群的 LongHorn 都是可以看到这个备份的,这是因为使用了同一个 Backup Target

数据备份与恢复八

在另一个集群,通过此备份创建一个 Volume

数据备份与恢复九

数据备份与恢复十

创建好后即可看到该 Volume,此时如果有更多数据写入 Nginx,Volume 也会自动进行同步

数据备份与恢复十一

当一侧集群宕机,或者服务不可用时,可以使用该 Volume 进行恢复

首先需要激活这个 Volume

数据备份与恢复十二

激活后使用这个 Volume 创建 PV/PVC

数据备份与恢复十三

在集群中就可以看到,然后通过这个 PV/PVC 重新创建 Nginx,可以看到原本的数据

数据备份与恢复十四

数据备份与恢复十五

Istio

在 UI 中可以直接选择 Istio 进行安装

部署两个版本的 Nginx

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
cat <<EOF | kubectl apply -f -
apiVersion: v1
data:
index.html.v1: |
<!DOCTYPE html>
<html>
<title>Welcome to nginx V1!</title>
</html>
index.html.v2: |
<!DOCTYPE html>
<html>
<title>Welcome to nginx V2!</title>
</html>
kind: ConfigMap
metadata:
name: nginx-conf
namespace: default
---

apiVersion: v1
kind: Service
metadata:
name: nginx
namespace: default
spec:
ports:
- name: port-80
port: 80
protocol: TCP
targetPort: 80
selector:
app: nginx
type: ClusterIP
---

apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: nginx
version: v1
name: nginx-v1
namespace: default
spec:
selector:
matchLabels:
app: nginx
version: v1
template:
metadata:
labels:
app: nginx
version: v1
sidecar.istio.io/inject: 'true'
spec:
containers:
- image: docker.io/library/nginx:mainline
imagePullPolicy: IfNotPresent
name: nginx-v1
volumeMounts:
- mountPath: /usr/share/nginx/html/index.html
name: nginx-conf
subPath: index.html.v1
volumes:
- configMap:
defaultMode: 420
name: nginx-conf
name: nginx-conf
---

apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: nginx
version: v2
name: nginx-v2
namespace: default
spec:
selector:
matchLabels:
app: nginx
version: v2
template:
metadata:
labels:
app: nginx
version: v2
sidecar.istio.io/inject: 'true'
spec:
containers:
- image: docker.io/library/nginx:mainline
imagePullPolicy: IfNotPresent
name: nginx-v2
volumeMounts:
- mountPath: /usr/share/nginx/html/index.html
name: nginx-conf
subPath: index.html.v2
volumes:
- configMap:
defaultMode: 420
name: nginx-conf
name: nginx-conf
EOF

创建 Istio Gateway

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
cat <<EOF | kubectl apply -f -
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: nginx-gateway
spec:
selector:
istio: ingressgateway
servers:
- port:
number: 80
name: http
protocol: HTTP
hosts:
- "*"
EOF

创建 Destination Rule

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
cat <<EOF | kubectl apply -f -
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: nginx
spec:
host: nginx
subsets:
- name: v1
labels:
version: v1
- name: v2
labels:
version: v2
EOF

创建 Virtual Service,先将流量全部转发到 Nginx V1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
cat <<EOF | kubectl apply -f -
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: nginx
spec:
hosts:
- "*"
gateways:
- nginx-gateway
http:
- match:
- uri:
prefix: /
route:
- destination:
host: nginx
port:
number: 80
subset: v1
weight: 100
EOF

通过 Istio Gateway 访问 Nginx,会发现返回都是 V1 版本

istio一

修改 Virtual Service,将 20% 的流量转发至 V2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
cat <<EOF | kubectl apply -f -
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: nginx
spec:
hosts:
- "*"
gateways:
- nginx-gateway
http:
- match:
- uri:
prefix: /
route:
- destination:
host: nginx
port:
number: 80
subset: v1
weight: 80
- destination:
host: nginx
port:
number: 80
subset: v2
weight: 20
EOF

可以看到会有部份流量转发至 V2

istio二

熔断也是通过 Destination Rule 实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
cat <<EOF | kubectl apply -f -
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: nginx-circuit-breaker
spec:
host: nginx
trafficPolicy:
connectionPool:
http:
# HTTP1 最大等待请求数
http1MaxPendingRequests: 1
# 每个连接的 HTTP 最大请求数
maxRequestsPerConnection: 1
tcp:
# TCP 最大连接数
maxConnections: 1
EOF

K3s

K3s 部署

1
2
3
4
5
6
7
8
mkdir -pv /etc/rancher/k3s

cat > /etc/rancher/k3s/config.yaml <<EOF
token: 12345
system-default-registry: registry.cn-hangzhou.aliyuncs.com
EOF

curl -sfL https://rancher-mirror.rancher.cn/k3s/k3s-install.sh | INSTALL_K3S_MIRROR=cn sh -