RKE 集群 Pod 一直处于 Terminating 状态

RKE 删除 Pod 的时候,Pod 的状态一直处于 Terminating,同时 kubelet 存在如下报错:

1
2025-10-23T14:34:08.462684510Z E1023 14:34:08.462648    3018 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/configmap/4dc46d4c-aa98-40b2-b941-9393978e4648-aaa-bbb-ccc podName:4dc46d4c-aa98-40b2-b941-9393978e4648 nodeName:}" failed. No retries permitted until 2025-10-23 14:36:10.462606964 +0000 UTC m=+68505754.237199481 (durationBeforeRetry 2m2s). Error: "error cleaning subPath mounts for volume \"aaa-bbb-ccc\" (UniqueName: \"kubernetes.io/configmap/4dc46d4c-aa98-40b2-b941-9393978e4648-aaa-bbb-ccc\") pod \"4dc46d4c-aa98-40b2-b941-9393978e4648\" (UID: \"4dc46d4c-aa98-40b2-b941-9393978e4648\") : error processing /var/lib/kubelet/pods/4dc46d4c-aa98-40b2-b941-9393978e4648/volume-subpaths/aaa-bbb-ccc/ddd-eee: error cleaning subpath mount /var/lib/kubelet/pods/4dc46d4c-aa98-40b2-b941-9393978e4648/volume-subpaths/aaa-bbb-ccc/ddd-eee/3: remove /var/lib/kubelet/pods/4dc46d4c-aa98-40b2-b941-9393978e4648/volume-subpaths/aaa-bbb-ccc/ddd-eee/3: device or resource busy"

该问题不影响新 Pod 的创建(如 Deployment 等资源的更新等等),但集群会残留较多 Terminating 状态的 Pod。

问题根因

基于以下社区资料:

  1. https://platform9.com/kb/kubernetes/pod-stuck-in-terminating-state-due-to-inability-to-clean-volume
  2. https://github.com/kubernetes/kubernetes/issues/65879

问题的根因是 /var/lib/kubelet 在宿主机上其实是一个符号链接 (symlink) 指向别处,由于这个 symlink 布局、或者卷清理逻辑未能正确处理 subPath 及其在 symlink 下的路径,导致卸载失败,Pod 就会一直处于 Terminating 的状态。

1
2
ls -lh /var/lib/kubelet
lrwxrwxrwx. 1 root root XX … /var/lib/kubelet -> /u/var/lib/kubelet

解决方案

短期解决方案

手动执行 umount 卸载挂载路径:

1
umount /var/lib/kubelet/pods/4dc46d4c-aa98-40b2-b941-9393978e4648/volume-subpaths/aaa-bbb-ccc/ddd-eee/3

然后强制删除 Terminating 状态的 Pod:

1
kubectl delete pod <pod_name> --force --grace-period=0

一键删除:

1
kubectl get pod -A --no-headers | awk '/Terminating/{ print "kubectl -n "$1" delete pod "$2" --force --grace-period=0" }'

长期解决方案

对于 /var/lib/kubelet 目录,避免使用符号链接。

Author

Warner Chen

Posted on

2025-11-03

Updated on

2025-11-03

Licensed under

You need to set install_url to use ShareThis. Please set it in _config.yml.
You forgot to set the business or currency_code for Paypal. Please set it in _config.yml.

Comments

You forgot to set the shortname for Disqus. Please set it in _config.yml.