Skip to content

Refactor driver discovery #825

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

elezar
Copy link
Member

@elezar elezar commented Dec 6, 2024

This change refactors driver discovery to prepare for the discovery of additional entities using libnvsandboxutils.

@elezar elezar force-pushed the refactor-driver-discovery branch 2 times, most recently from 8ad7493 to a561a05 Compare December 6, 2024 12:55
@elezar elezar requested a review from cdesiniotis December 6, 2024 13:08
@elezar elezar force-pushed the refactor-driver-discovery branch from a561a05 to 841c7d1 Compare December 6, 2024 13:09
@@ -49,6 +49,7 @@ TRANSLATOR:
const:
- {action: accept, from: "^NVSANDBOXUTILS_"}
- {action: accept, from: "^nvSandboxUtils"}
- {action: replace, from: "^NVSANDBOXUTILS_255_MASK_", to: "MASK255_" }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this was required.

This change aligns the driver file discovery with device discovery
and allows other sources such as nvsandboxutils to be added.

Signed-off-by: Evan Lezar <[email protected]>
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the driver discovery workflow to centralize version detection and entity lookup in a new dgpu package and integrate libnvsandboxutils for sandbox utils discovery. Key changes include:

  • Replaced bespoke CUDA and NVML version lookups with a unified getDriverVersion fallback chain.
  • Introduced internal/platform-support/dgpu with option-based NewDriverDiscoverer.
  • Updated all nvcdi consumers to use the new dgpu.NewDriverDiscoverer and removed old discovery helpers.

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
pkg/nvcdi/wrapper.go No functional change; remains the CDI spec wrapper.
pkg/nvcdi/management.go Swapped CUDA version lookup for getDriverVersion and new dgpu discoverer.
pkg/nvcdi/lib.go Refactored driver-version helpers into nvcdilib.
pkg/nvcdi/lib-nvml.go Updated common edits to call new getDriverVersion and pass version to NVML discoverer.
pkg/nvcdi/common-nvml.go Signature change to include version in the common NVML discoverer.
internal/platform-support/dgpu/options.go Added new option setters (WithDriver, WithLdconfigPath, WithVersion).
internal/platform-support/dgpu/driver.go Implemented NewDriverDiscoverer using option-driven construction.
internal/platform-support/dgpu/driver-nvsandboxutils.go Stub for nvsandboxutils discovery.
internal/platform-support/dgpu/driver-nvml.go Moved NVML-based driver discovery into options-based methods.
internal/nvsandboxutils/gen/.../nvsandboxutils.yml Allowed MASK255_ constant renaming for sandbox utils.
Comments suppressed due to low confidence (4)

internal/platform-support/dgpu/options.go:89

  • There is a typo in WithNvsandboxuitilsLib (should be WithNvsandboxutilsLib). Correcting this will avoid confusion and maintain consistency.
func WithNvsandboxuitilsLib(nvsandboxutilslib nvsandboxutils.Interface) Option {

internal/platform-support/dgpu/driver.go:26

  • This new NewDriverDiscoverer logic is complex and currently untested—adding unit tests to cover option parsing and fallback behavior will help catch future regressions.
// NewDriverDiscoverer creates a discoverer for the libraries and binaries associated with a driver installation.

internal/platform-support/dgpu/driver.go:28

  • The call to new(opts...) is invalid Go syntax and won’t initialize your options struct. Consider creating a pointer to options (e.g. o := &options{}) and then applying each opt function.
	o := new(opts...)

internal/platform-support/dgpu/driver-nvml.go:19

  • The code references lookup.NewFileLocator, lookup.WithLogger, etc., but the lookup package is not imported. Add "github.com/NVIDIA/nvidia-container-toolkit/internal/lookup" (and lookup/root if needed).
import (


nvsandboxutilsDiscoverer, err := o.newNvsandboxutilsDriverDiscoverer()
if err != nil {
// TODO: Log a warning
Copy link
Preview

Copilot AI Jul 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Leftover TODO comments for logging failure of sub-discoverers. Consider implementing real logging or removing the TODO if not needed.

Suggested change
// TODO: Log a warning
o.logger.Warn(fmt.Sprintf("Failed to create nvsandboxutils driver discoverer: %v", err))

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant