Skip to content

Load NVIDIA Kernel Modules for JIT-CDI mode #975

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

elezar
Copy link
Member

@elezar elezar commented Mar 9, 2025

This change attempts to load the nvidia, nvidia-uvm, and nvidia-modeset kernel modules (NVIDIA kernel-mode GPU drivers) before generating the automatic (jit) CDI specification. This aligns the behaviour of the JIT-CDI mode with that of the nvidia-container-cli.

The kernel modules can be controlled by the

nvidia-container-runtime.modes.jit-cdi.load-kernel-modules

config option. If this is set to the empty list, then no kernel modules are loaded.

Errors in loading the kernel modules are logged, but ignored.

This change attempts to load the nvidia, nvidia-uvm, and nvidia-modeset
kernel modules before generating the automatic (jit) CDI specification.

The kernel modules can be controlled by the

nvidia-container-runtime.modes.jit-cdi.load-kernel-modules

config option. If this is set to the empty list, then no kernel modules
are loaded.

Errors in loading the kernel modules are logged, but ignored.

Signed-off-by: Evan Lezar <[email protected]>
@jgehrcke
Copy link

This change attempts to load the nvidia, nvidia-uvm, and nvidia-modeset kernel modules before generating the automatic (jit) CDI specification.

What is this good for?

@@ -74,6 +74,9 @@ spec-dirs = ["/etc/cdi", "/var/run/cdi"]
[nvidia-container-runtime.modes.csv]
mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"

[nvidia-container-runtime.modes.jit-cdi]
load-kernel-modules = ["nvidia", "nvidia-uvm", "nvidia-modeset"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did we end up making this fine selection? 🍷 🍇 🧀

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@elezar elezar Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a typo. The module names should be nvidia, nvidia_uvm and nvidia_modeset.

Copy link

@jgehrcke jgehrcke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, Evan.

@elezar
Copy link
Member Author

elezar commented Apr 2, 2025

This change attempts to load the nvidia, nvidia-uvm, and nvidia-modeset kernel modules before generating the automatic (jit) CDI specification.

What is this good for?

I have updated the description with more motivation.

@@ -192,6 +193,11 @@ func generateAutomaticCDISpec(logger logger.Interface, cfg *config.Config, devic
return nil, fmt.Errorf("failed to construct CDI library: %w", err)
}

// TODO: Consider moving this into the nvcdi API.
if err := driver.LoadKernelModules(cfg.NVIDIAContainerRuntimeConfig.Modes.JitCDI.LoadKernelModules...); err != nil {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@klueska are there any cases where we DON'T want to load / try to load the kernel modules? Note that we aslo skip this when running in a user namespace in libnvidia-container.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants