基于 RKE2 与 NVIDIA Tesla P4 的 HAMi 使用实践

HAMi 是一个用于管理 Kubernetes 集群中异构 AI 计算设备的开源平台。其前身为 k8s-vGPU-scheduler,可在多个容器和工作负载之间实现设备共享。

本文基于 RKE2 和 NVIDIA Tesla P4 进行测试,同时使用 GPU Operator 在 GPU 节点上自动安装驱动和 Nvidia Container Toolkit。


前提条件

  1. RKE2 已部署,版本 >= 1.18,本文版本为 v1.35.5+rke2r2
  2. GPU 节点已接入 RKE2 集群
  3. RKE2 集群已部署 Prometheus,版本 > 2.8.0(本文通过 Rancher Monitoring 部署)

安装 GPU Operator

本文使用 GPU Operator 在 GPU 节点上自动安装驱动和 Nvidia Container Toolkit,因此 GPU 节点需要提前关闭 Secure Boot 并重启节点。可参考:https://warnerchen.github.io/2024/12/17/RKE-RKE2-%E8%8A%82%E7%82%B9%E9%85%8D%E7%BD%AE-Nvidia-Container-Runtime/#%E5%AE%89%E8%A3%85%E5%89%8D%E5%87%86%E5%A4%87

安装 GPU Operator:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
helm repo add nvidia https://nvidia.github.io/helm-charts/
helm repo update

cat <<EOF > gpu-operator-values.yaml
# 由于 HAMi 也包含 Device Plugin 组件,因此需要关闭 GPU Operator 的 Device Plugin
# 参考文档:https://project-hami.io/zh/docs/faq#hami-%E6%8F%92%E4%BB%B6volcano-%E6%8F%92%E4%BB%B6nvidia-%E5%AE%98%E6%96%B9%E6%8F%92%E4%BB%B6%E4%B8%89%E8%80%85%E7%9A%84%E5%85%B3%E7%B3%BB%E4%B8%8E%E5%85%BC%E5%AE%B9%E6%80%A7
devicePlugin:
enabled: false

toolkit:
env:
- name: CONTAINERD_CONFIG
value: /var/lib/rancher/rke2/agent/etc/containerd/config.toml
- name: CONTAINERD_SOCKET
value: /run/k3s/containerd/containerd.sock
EOF

helm -n gpu-operator upgrade --install gpu-operator nvidia/gpu-operator -f gpu-operator-values.yaml --create-namespace

等待 nvidia-driver-daemonsetnvidia-container-toolkit-daemonset 完成驱动及 Nvidia Container Toolkit 安装。

在 GPU 节点上检查是否安装成功:

1
2
3
4
5
lsmod | grep nvidia
nvidia_modeset 1622016 0
nvidia_uvm 1781760 6
nvidia 104058880 32 nvidia_uvm,nvidia_modeset
drm 622592 7 vmwgfx,drm_kms_helper,nvidia,drm_ttm_helper,ttm

设置默认 Runtime

HAMi 要求默认运行时配置为 Nvidia 运行时,所以需要在 GPU 节点上进行配置,可参考:https://warnerchen.github.io/2024/12/17/RKE-RKE2-%E8%8A%82%E7%82%B9%E9%85%8D%E7%BD%AE-Nvidia-Container-Runtime/#RKE2-GPU-%E8%8A%82%E7%82%B9%E9%85%8D%E7%BD%AE-Nvidia-Container-Runtime

1
vim /var/lib/rancher/rke2/agent/etc/containerd/config-v3.toml.tmpl

新增以下内容:

1
2
3
4
{{ template "base" . }}

[plugins."io.containerd.cri.v1.runtime".containerd]
default_runtime_name = "nvidia"

重启节点上的 RKE2 服务:

1
2
# rke2-server or rke2-agent
systemctl restart rke2-agent

安装 HAMi

通过官方 Helm Chart 安装 HAMi:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
helm repo add hami-charts https://project-hami.github.io/HAMi/
helm repo update

cat <<EOF > hami-values.yaml
devicePlugin:
# 以 CDI 方式将 NVIDIA GPU 注入容器
deviceListStrategy: cdi-annotations
# 通过 GPU Operator 安装的驱动和 Toolkit 默认路径
nvidiaDriverRoot: /run/nvidia/driver
nvidiaHookPath: /usr/local/nvidia/toolkit/nvidia-ctk

scheduler:
kubeScheduler:
# 根据 Kubernetes 版本而定
imageTag: v1.35.5
EOF

helm -n hami upgrade --install hami hami-charts/hami -f hami-values.yaml --create-namespace

安装完成后,检查 hami-device-pluginhami-scheduler 是否正常运行。


安装 HAMi Web UI

通过官方 Helm Chart 安装 HAMi Web UI:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
helm repo add hami-webui https://project-hami.github.io/HAMi-WebUI
helm repo update

cat <<EOF > hami-webui-values.yaml
# GPU Operator 已安装 dcgm-exporter,此处关闭
dcgm-exporter:
enabled: false

externalPrometheus:
address: http://rancher-monitoring-prometheus.cattle-monitoring-system.svc.cluster.local:9090
enabled: true

service:
type: NodePort
EOF

helm -n hami upgrade --install hami-webui hami-webui/hami-webui -f hami-webui-values.yaml --create-namespace

安装完成后,检查 Pod hami-webui 是否正常运行。


创建 ServiceMonitor 采集 GPU 指标

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
cat <<EOF | kubectl apply -f -
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: nvidia-dcgm-exporter
namespace: gpu-operator
labels:
release: prometheus
spec:
selector:
matchLabels:
app: nvidia-dcgm-exporter
namespaceSelector:
matchNames:
- gpu-operator
endpoints:
- port: gpu-metrics
path: /metrics
interval: 15s
EOF

使用场景

以下示例基于单张 NVIDIA Tesla P4 8GB 显卡,展示 HAMi 的典型使用方式和效果。


整卡独占

如果希望独占使用 P4,可以使用以下配置。该方式与 NVIDIA Device Plugin 的行为类似:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
apiVersion: v1
kind: Pod
metadata:
name: p4-full-card-smoke
spec:
restartPolicy: Never
containers:
- name: cuda
image: nvidia/cuda:11.8.0-base-ubuntu22.04
command: ["bash", "-lc"]
args: ["nvidia-smi && sleep 3600"]
resources:
limits:
nvidia.com/gpu: 1

Pod 内执行 nvidia-smi 的效果如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
root@p4-full-card-smoke:/# nvidia-smi
Tue Jun 30 04:02:50 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.126.20 Driver Version: 580.126.20 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla P4 On | 00000000:03:00.0 Off | 0 |
| N/A 35C P8 7W / 75W | 0MiB / 7680MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+

单个小切片

以下示例用于创建一个小型共享 GPU 任务,占用约 2 GB 显存和 25% 计算资源:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
apiVersion: v1
kind: Pod
metadata:
name: p4-slice-2g-25core
spec:
restartPolicy: Never
containers:
- name: cuda
image: nvidia/cuda:11.8.0-base-ubuntu22.04
command: ["bash", "-lc"]
args: ["nvidia-smi && sleep 3600"]
resources:
limits:
nvidia.com/gpu: 1
nvidia.com/gpumem: 2000
nvidia.com/gpucores: 25
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
root@p4-slice-2g-25core:/# nvidia-smi
Tue Jun 30 04:01:30 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.126.20 Driver Version: 580.126.20 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla P4 On | 00000000:03:00.0 Off | 0 |
| N/A 35C P8 7W / 75W | 0MiB / 2000MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+

4 个小任务共享 P4

以下示例用于让 4 个小型副本共享一张 P4 显卡,显存配置为 4 x 1800 MB = 7200 MB

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
apiVersion: apps/v1
kind: Deployment
metadata:
name: p4-four-way-shared
spec:
replicas: 4
selector:
matchLabels:
app: p4-four-way-shared
template:
metadata:
labels:
app: p4-four-way-shared
spec:
containers:
- name: cuda
image: nvidia/cuda:11.8.0-base-ubuntu22.04
command: ["bash", "-lc"]
args:
- while true; do nvidia-smi --query-gpu=name,memory.total,memory.used,utilization.gpu --format=csv,noheader; sleep 60; done
resources:
limits:
nvidia.com/gpu: 1
nvidia.com/gpumem: 1800
nvidia.com/gpucores: 25
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
root@p4-four-way-shared-5f4b84787c-bww94:/# nvidia-smi
Tue Jun 30 04:21:25 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.126.20 Driver Version: 580.126.20 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla P4 On | 00000000:03:00.0 Off | 0 |
| N/A 35C P8 7W / 75W | 0MiB / 1800MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+

2 个较大任务共享 P4

以下示例用于让两个较大的副本共享一张 P4,显存配置为 2 x 3500 MB = 7000 MB,计算资源配置为 2 x 50%

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
apiVersion: apps/v1
kind: Deployment
metadata:
name: p4-two-way-shared
spec:
replicas: 2
selector:
matchLabels:
app: p4-two-way-shared
template:
metadata:
labels:
app: p4-two-way-shared
spec:
containers:
- name: cuda
image: nvidia/cuda:11.8.0-base-ubuntu22.04
command: ["bash", "-lc"]
args: ["nvidia-smi && sleep 3600"]
resources:
limits:
nvidia.com/gpu: 1
nvidia.com/gpumem: 3500
nvidia.com/gpucores: 50
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
root@p4-two-way-shared-5fb9fb7fc7-75tgw:/# nvidia-smi
Tue Jun 30 04:35:57 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.126.20 Driver Version: 580.126.20 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla P4 On | 00000000:03:00.0 Off | 0 |
| N/A 35C P8 6W / 75W | 0MiB / 3500MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+

按百分比分配显存

以下示例使用显存百分比,而不是绝对显存值。需要注意,不要在同一个容器中同时使用 nvidia.com/gpumemnvidia.com/gpumem-percentage

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
apiVersion: v1
kind: Pod
metadata:
name: p4-25-percent-memory
spec:
restartPolicy: Never
containers:
- name: cuda
image: nvidia/cuda:11.8.0-base-ubuntu22.04
command: ["bash", "-lc"]
args: ["nvidia-smi && sleep 3600"]
resources:
limits:
nvidia.com/gpu: 1
nvidia.com/gpumem-percentage: 25
nvidia.com/gpucores: 25
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
root@p4-25-percent-memory:/# nvidia-smi
Tue Jun 30 04:37:03 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.126.20 Driver Version: 580.126.20 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla P4 On | 00000000:03:00.0 Off | 0 |
| N/A 35C P8 6W / 75W | 0MiB / 1920MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+

只调度到 P4

在混合 GPU 集群中,如果要求工作负载使用 Tesla P4,可以通过以下方式指定 GPU 类型。Type 的值会与节点上报的 GPU 类型进行匹配,可以通过 nvidia-smi -L 命令获取 Type:

1
2
root@p4-four-way-shared-5f4b84787c-wxbt7:/# nvidia-smi -L
GPU 0: Tesla P4 (UUID: GPU-b4112389-72c9-3355-0125-46309c2dfc76)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
apiVersion: v1
kind: Pod
metadata:
name: p4-type-required
annotations:
nvidia.com/use-gputype: "Tesla P4"
spec:
restartPolicy: Never
containers:
- name: cuda
image: nvidia/cuda:11.8.0-base-ubuntu22.04
command: ["bash", "-lc"]
args: ["nvidia-smi && sleep 3600"]
resources:
limits:
nvidia.com/gpu: 1
nvidia.com/gpumem: 2000
nvidia.com/gpucores: 25
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
root@p4-type-required:/# nvidia-smi
Tue Jun 30 04:38:45 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.126.20 Driver Version: 580.126.20 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla P4 On | 00000000:03:00.0 Off | 0 |
| N/A 35C P8 7W / 75W | 0MiB / 2000MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+

指定 GPU UUID

如果需要绑定到特定的物理 GPU UUID,可以通过 nvidia-smi -L 命令获取 UUID:

1
2
root@p4-four-way-shared-5f4b84787c-wxbt7:/# nvidia-smi -L
GPU 0: Tesla P4 (UUID: GPU-b4112389-72c9-3355-0125-46309c2dfc76)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
apiVersion: v1
kind: Pod
metadata:
name: p4-specific-uuid
annotations:
nvidia.com/use-gpuuuid: "GPU-b4112389-72c9-3355-0125-46309c2dfc76"
spec:
restartPolicy: Never
containers:
- name: cuda
image: nvidia/cuda:11.8.0-base-ubuntu22.04
command: ["bash", "-lc"]
args: ["nvidia-smi && sleep 3600"]
resources:
limits:
nvidia.com/gpu: 1
nvidia.com/gpumem: 2000
nvidia.com/gpucores: 25
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
root@p4-specific-uuid:/# nvidia-smi
Tue Jun 30 04:42:59 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.126.20 Driver Version: 580.126.20 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla P4 On | 00000000:03:00.0 Off | 0 |
| N/A 35C P8 7W / 75W | 0MiB / 2000MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+

Namespace 配额

以下示例用于限制共享 P4 的命名空间配额。该配置会将命名空间限制为最多 4 个 vGPU 任务,并限制 HAMi GPU 显存总量约为 7.2 GB:

1
2
3
4
5
6
7
8
9
apiVersion: v1
kind: ResourceQuota
metadata:
name: p4-shared-quota
namespace: default
spec:
hard:
limits.nvidia.com/gpu: 4
limits.nvidia.com/gpumem: 7200

Binpack 调度策略

以下示例基于 Pod 配置节点调度策略。在多节点集群中,Binpack 策略会优先填满节点:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
apiVersion: v1
kind: Pod
metadata:
name: p4-binpack-policy
annotations:
hami.io/node-scheduler-policy: "binpack"
spec:
restartPolicy: Never
containers:
- name: cuda
image: nvidia/cuda:11.8.0-base-ubuntu22.04
command: ["bash", "-lc"]
args: ["nvidia-smi && sleep 3600"]
resources:
limits:
nvidia.com/gpu: 1
nvidia.com/gpumem: 1800
nvidia.com/gpucores: 25

GPU 资源监控

通过 HAMi Web UI 可以查看 GPU 资源监控信息:

Author

Warner Chen

Posted on

2026-06-30

Updated on

2026-06-30

Licensed under

You need to set install_url to use ShareThis. Please set it in _config.yml.
You forgot to set the business or currency_code for Paypal. Please set it in _config.yml.

Comments

You forgot to set the shortname for Disqus. Please set it in _config.yml.