Fix Pod label, Nodegroup and nvidia chart name issue #713

Micky-Yang · 2025-10-10T09:30:49Z

In my actual reference documents, during the practise, I found the following problems:

Helm chart repository name conflict related to Nvidia

helm repo add nvidia https://nvidia.github.io/k8s-device-plugin
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm repo add nvidia https://nvidia.github.io/gpu-operator

All helm repos are named nvidia, which causes repo name conflicts. Different repo names have been used to distinguish them.

Automatic installation instructions for the Nvidia device plugin service
pending issues of coredns and metrics-server services
The Pod cannot be scheduled because the default gpu-dra-nodes adds taints:

kube-system   coredns-7bf648ff5d-4bs45               0/1     Pending   0          12m
kube-system   coredns-7bf648ff5d-n4bwq               0/1     Pending   0          12m
kube-system   metrics-server-7fb96f5556-4cpdl        0/1     Pending   0          12m
kube-system   metrics-server-7fb96f5556-6mbvh        0/1     Pending   0          12m

So a base-nodes NodeGroup was added to fix this problem.

Pod Pending issue caused by mismatch between Pod nodeSelector label and NodeGroup label key

The Pod nodeSelector label is NodeGroupType: gpu-dra, but NodeGroup label is node-type: "gpu-dra", so change the Pod nodeSelector label to node-type: "gpu-dra".

…dia.

…n service.

…s-server services.

… label and NodeGroup label key.

Yiyang Jiang added 7 commits October 10, 2025 16:46

Fixed the issue of Helm chart repository name conflict related to Nvi…

776bd26

…dia.

Added automatic installation instructions for the Nvidia device plugi…

52e517d

…n service.

Added base NodeGroup, to fix the pending issues of coredns and metric…

bb3a2fe

…s-server services.

Fix the Pod Pending issue caused by mismatch between Pod nodeSelector…

4da2df7

… label and NodeGroup label key.

Format eks ClusterConfig.

8c954a5

fix dra-nvidia-plugin format issue.

fad7a6f

fix dra-nvidia-plugin command format issue.

fa90e23

Micky-Yang requested a review from a team as a code owner October 10, 2025 09:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Pod label, Nodegroup and nvidia chart name issue #713

Fix Pod label, Nodegroup and nvidia chart name issue #713

Uh oh!

Micky-Yang commented Oct 10, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix Pod label, Nodegroup and nvidia chart name issue #713

Are you sure you want to change the base?

Fix Pod label, Nodegroup and nvidia chart name issue #713

Uh oh!

Conversation

Micky-Yang commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Micky-Yang commented Oct 10, 2025 •

edited

Loading