|
| 1 | +# Running the NVIDIA DRA Driver on Red Hat OpenShift |
| 2 | + |
| 3 | +This document explains the differences between deploying the NVIDIA DRA driver on OpenShift and upstream Kubernetes or its flavors. |
| 4 | + |
| 5 | +## Prerequisites |
| 6 | + |
| 7 | +Install a recent build of OpenShift 4.16 (.e.g. 4.16.0-ec.3). You can obtain an IPI installer binary (`openshift-install`) from the [Release Status](https://amd64.ocp.releases.ci.openshift.org/) page, or use the Assisted Installer to install on bare metal. Refer to the [OpenShift documentation](https://docs.openshift.com/container-platform/4.15/installing/index.html) for different installation methods. |
| 8 | + |
| 9 | +## Enabling DRA on OpenShift |
| 10 | + |
| 11 | +Enable the `TechPreviewNoUpgrade` feature set as explained in [Enabling features using FeatureGates](https://docs.openshift.com/container-platform/4.15/nodes/clusters/nodes-cluster-enabling-features.html), either during the installation or post-install. The feature set includes the `DynamicResourceAllocation` feature. |
| 12 | + |
| 13 | +Update the cluster scheduler to enable the DRA scheduling plugin: |
| 14 | + |
| 15 | +```console |
| 16 | +$ oc patch --type merge -p '{"spec":{"profile": "HighNodeUtilization", "profileCustomizations": {"dynamicResourceAllocation": "Enabled"}}}' scheduler cluster |
| 17 | +``` |
| 18 | + |
| 19 | +## NVIDIA GPU Drivers |
| 20 | + |
| 21 | +The easiest way to install NVIDIA GPU drivers on OpenShift nodes is via the NVIDIA GPU Operator. |
| 22 | + |
| 23 | +**Be careful to disable the device plugin so it does not conflict with the DRA plugin**. It is recommended to leave only the NVIDIA GPU driver and driver toolkit configs, and disable everything else: |
| 24 | + |
| 25 | +```yaml |
| 26 | + <...> |
| 27 | + devicePlugin: |
| 28 | + enabled: false |
| 29 | + <...> |
| 30 | + driver: |
| 31 | + enabled: true |
| 32 | + <...> |
| 33 | + toolkit: |
| 34 | + enabled: true |
| 35 | + <...> |
| 36 | +``` |
| 37 | + |
| 38 | + |
| 39 | +The NVIDIA GPU Operator might not be available through the OperatorHub in a pre-production version of OpenShift. In this case, deploy the operator from a bundle or add a certified catalog index from an earlier version of OpenShift, e.g.: |
| 40 | + |
| 41 | +```yaml |
| 42 | +kind: CatalogSource |
| 43 | +apiVersion: operators.coreos.com/v1alpha1 |
| 44 | +metadata: |
| 45 | + name: certified-operators-v415 |
| 46 | + namespace: openshift-marketplace |
| 47 | +spec: |
| 48 | + displayName: Certified Operators v4.15 |
| 49 | + image: registry.redhat.io/redhat/certified-operator-index:v4.15 |
| 50 | + priority: -100 |
| 51 | + publisher: Red Hat |
| 52 | + sourceType: grpc |
| 53 | + updateStrategy: |
| 54 | + registryPoll: |
| 55 | + interval: 10m0s |
| 56 | +``` |
| 57 | +
|
| 58 | +Then follow the installation steps in [NVIDIA GPU Operator on Red Hat OpenShift Container Platform](https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/index.html). |
| 59 | +
|
| 60 | +## NVIDIA Binaries on RHCOS |
| 61 | +
|
| 62 | +The location of some NVIDIA binaries on an OpenShift node differs from the defaults. Make sure to pass the following values when installing the Helm chart: |
| 63 | +
|
| 64 | +```yaml |
| 65 | +nvidiaDriverRoot: /run/nvidia/driver |
| 66 | +nvidiaCtkPath: /var/usrlocal/nvidia/toolkit/nvidia-ctk |
| 67 | +``` |
| 68 | +
|
| 69 | +## OpenShift Security |
| 70 | +
|
| 71 | +OpenShift generally requires more stringent security settings than Kubernetes. If you see a warning about security context constraints when deploying the DRA plugin, pass the following to the Helm chart, either via an in-line variable or a values file: |
| 72 | +
|
| 73 | +```yaml |
| 74 | +kubeletPlugin: |
| 75 | + containers: |
| 76 | + plugin: |
| 77 | + securityContext: |
| 78 | + privileged: true |
| 79 | + seccompProfile: |
| 80 | + type: Unconfined |
| 81 | +``` |
| 82 | +
|
| 83 | +If you see security context constraints errors/warnings when deploying a sample workload, make sure to update the workload's security settings according to the [OpenShift documentation](https://docs.openshift.com/container-platform/4.15/operators/operator_sdk/osdk-complying-with-psa.html). Usually applying the following `securityContext` definition at a pod or container level works for non-privileged workloads. |
| 84 | + |
| 85 | +```yaml |
| 86 | + securityContext: |
| 87 | + runAsNonRoot: true |
| 88 | + seccompProfile: |
| 89 | + type: RuntimeDefault |
| 90 | + allowPrivilegeEscalation: false |
| 91 | + capabilities: |
| 92 | + drop: |
| 93 | + - ALL |
| 94 | +``` |
| 95 | + |
| 96 | +If you see the following error when trying to deploy a workload: |
| 97 | + |
| 98 | +```console |
| 99 | +Warning FailedScheduling 21m default-scheduler running Reserve plugin "DynamicResources": podschedulingcontexts.resource.k8s.io "gpu-example" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , <nil> |
| 100 | +``` |
| 101 | + |
| 102 | +apply the following RBAC configuration (this should be fixed in newer OpenShift builds): |
| 103 | + |
| 104 | +```yaml |
| 105 | +apiVersion: rbac.authorization.k8s.io/v1 |
| 106 | +kind: ClusterRole |
| 107 | +metadata: |
| 108 | + name: system:kube-scheduler:podfinalizers |
| 109 | +rules: |
| 110 | +- apiGroups: |
| 111 | + - "" |
| 112 | + resources: |
| 113 | + - pods/finalizers |
| 114 | + verbs: |
| 115 | + - update |
| 116 | +--- |
| 117 | +apiVersion: rbac.authorization.k8s.io/v1 |
| 118 | +kind: ClusterRoleBinding |
| 119 | +metadata: |
| 120 | + name: system:kube-scheduler:podfinalizers:crbinding |
| 121 | +roleRef: |
| 122 | + apiGroup: rbac.authorization.k8s.io |
| 123 | + kind: ClusterRole |
| 124 | + name: system:kube-scheduler:podfinalizers |
| 125 | +subjects: |
| 126 | +- kind: User |
| 127 | + name: system:kube-scheduler |
| 128 | +``` |
| 129 | + |
| 130 | +## Using Multi-Instance GPU (MIG) |
| 131 | + |
| 132 | +Workloads that use the Multi-instance GPU (MIG) feature require MIG to be [enabled](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#enable-mig-mode) on the worker nodes with [MIG-supported GPUs](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#supported-gpus), e.g. A100. |
| 133 | + |
| 134 | +You can do it via the driver daemon set pod running on a GPU node as follows (here, the GPU ID is 0, i.e. `-i 0`): |
| 135 | + |
| 136 | +```console |
| 137 | +oc exec -ti nvidia-driver-daemonset-416.94.202402160025-0-g45bd -n nvidia-gpu-operator -- nvidia-smi -i 0 -mig 1 |
| 138 | +Enabled MIG Mode for GPU 00000000:0A:00.0 |
| 139 | +All done. |
| 140 | +``` |
0 commit comments