|
| 1 | +# Running the NVIDIA DRA Driver on Red Hat OpenShift |
| 2 | + |
| 3 | +This document explains the differences between deploying the NVIDIA DRA driver on OpenShift and upstream Kubernetes or its flavors. |
| 4 | + |
| 5 | +## Prerequisites |
| 6 | + |
| 7 | +Install OpenShift 4.16 or later. You can use the Assisted Installer to install on bare metal, or obtain an IPI installer binary (`openshift-install`) from the [OpenShift clients page](https://mirror.openshift.com/pub/openshift-v4/clients/ocp/) page. Refer to the [OpenShift documentation](https://docs.openshift.com/container-platform/4.16/installing/index.html) for different installation methods. |
| 8 | + |
| 9 | +## Enabling DRA on OpenShift |
| 10 | + |
| 11 | +Enable the `TechPreviewNoUpgrade` feature set as explained in [Enabling features using FeatureGates](https://docs.openshift.com/container-platform/4.16/nodes/clusters/nodes-cluster-enabling-features.html), either during the installation or post-install. The feature set includes the `DynamicResourceAllocation` feature gate. |
| 12 | + |
| 13 | +Update the cluster scheduler to enable the DRA scheduling plugin: |
| 14 | + |
| 15 | +```console |
| 16 | +$ oc patch --type merge -p '{"spec":{"profile": "HighNodeUtilization", "profileCustomizations": {"dynamicResourceAllocation": "Enabled"}}}' scheduler cluster |
| 17 | +``` |
| 18 | + |
| 19 | +## NVIDIA GPU Drivers |
| 20 | + |
| 21 | +The easiest way to install NVIDIA GPU drivers on OpenShift nodes is via the NVIDIA GPU Operator with the device plugin disabled. Follow the installation steps in [NVIDIA GPU Operator on Red Hat OpenShift Container Platform](https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/index.html), and **_be careful to disable the device plugin so it does not conflict with the DRA plugin_**: |
| 22 | + |
| 23 | +```yaml |
| 24 | + devicePlugin: |
| 25 | + enabled: false |
| 26 | +``` |
| 27 | +
|
| 28 | +## NVIDIA Binaries on RHCOS |
| 29 | +
|
| 30 | +The location of some NVIDIA binaries on an OpenShift node differs from the defaults. Make sure to pass the following values when installing the Helm chart: |
| 31 | +
|
| 32 | +```yaml |
| 33 | +nvidiaDriverRoot: /run/nvidia/driver |
| 34 | +nvidiaCtkPath: /var/usrlocal/nvidia/toolkit/nvidia-ctk |
| 35 | +``` |
| 36 | +
|
| 37 | +## OpenShift Security |
| 38 | +
|
| 39 | +OpenShift generally requires more stringent security settings than Kubernetes. If you see a warning about security context constraints when deploying the DRA plugin, pass the following to the Helm chart, either via an in-line variable or a values file: |
| 40 | +
|
| 41 | +```yaml |
| 42 | +kubeletPlugin: |
| 43 | + containers: |
| 44 | + plugin: |
| 45 | + securityContext: |
| 46 | + privileged: true |
| 47 | + seccompProfile: |
| 48 | + type: Unconfined |
| 49 | +``` |
| 50 | +
|
| 51 | +If you see security context constraints errors/warnings when deploying a sample workload, make sure to update the workload's security settings according to the [OpenShift documentation](https://docs.openshift.com/container-platform/4.16/operators/operator_sdk/osdk-complying-with-psa.html). Usually applying the following `securityContext` definition at a pod or container level works for non-privileged workloads. |
| 52 | + |
| 53 | +```yaml |
| 54 | + securityContext: |
| 55 | + runAsNonRoot: true |
| 56 | + seccompProfile: |
| 57 | + type: RuntimeDefault |
| 58 | + allowPrivilegeEscalation: false |
| 59 | + capabilities: |
| 60 | + drop: |
| 61 | + - ALL |
| 62 | +``` |
| 63 | + |
| 64 | +## Using Multi-Instance GPU (MIG) |
| 65 | + |
| 66 | +Workloads that use the Multi-instance GPU (MIG) feature require MIG to be enabled on the worker nodes with [MIG-supported GPUs](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#supported-gpus), e.g. A100. |
| 67 | + |
| 68 | +First, make sure to stop any custom pods that might be using the GPU to avoid disruption when the new MIG configuration is applied. |
| 69 | + |
| 70 | +Enable MIG via the MIG manager of the NVIDIA GPU Operator. **Do not configure MIG devices as the DRA driver will do it automatically on the fly**: |
| 71 | + |
| 72 | +```console |
| 73 | +$ oc label node <node> nvidia.com/mig.config=all-enabled --overwrite |
| 74 | +``` |
| 75 | + |
| 76 | +MIG will be automatically enabled on the labeled nodes. For additional information, see [MIG Support in OpenShift Container Platform](https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/mig-ocp.html). |
| 77 | + |
| 78 | +**Note:** The `all-enabled` MIG configuration profile is available out of the box in the NVIDIA GPU Operator starting v24.3. With an earlier version, you may need to [create a custom profile](https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/mig-ocp.html#creating-and-applying-a-custom-mig-configuration). |
| 79 | + |
| 80 | +You can verify the MIG status using the `nvidia-smi` command from a GPU driver pod: |
| 81 | + |
| 82 | +```console |
| 83 | +$ oc exec -ti nvidia-driver-daemonset-<suffix> -n nvidia-gpu-operator -- nvidia-smi |
| 84 | ++-----------------------------------------------------------------------------------------+ |
| 85 | +| NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: N/A | |
| 86 | +|-----------------------------------------+------------------------+----------------------+ |
| 87 | +| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | |
| 88 | +| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | |
| 89 | +| | | MIG M. | |
| 90 | +|=========================================+========================+======================| |
| 91 | +| 0 NVIDIA A100 80GB PCIe On | 00000000:17:00.0 Off | On | |
| 92 | +| N/A 35C P0 45W / 300W | 0MiB / 81920MiB | N/A Default | |
| 93 | +| | | Enabled | |
| 94 | ++-----------------------------------------+------------------------+----------------------+ |
| 95 | +``` |
| 96 | + |
| 97 | +If the MIG status is marked with an asterisk (i.e. `Enabled*`), it means that the setting could not be fully applied and you may need to reboot the node. |
| 98 | +This can happen on some cloud service providers (CSP) where the CSP blocks GPU reset for the GPUs passed into a VM. |
| 99 | + |
| 100 | +See the [NVIDIA Multi-Instance GPU User Guide](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html) for more information about MIG. |
0 commit comments