Skip to content

Create .so symlinks for driver libraries in container #326

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

elezar
Copy link
Member

@elezar elezar commented Feb 1, 2024

This change explicitly creates .so and SONAME symlinks for injected driver libraries. This means that the create-soname-symlinks hook that was added in #947 is not longer required and is therefore removed.

$ docker run --rm -ti --runtime=nvidia --gpus all busybox  ls -al /usr/lib/x86_64-linux-gnu/
total 263564
drwxr-xr-x    2 root     root          4096 Jul 21 10:00 .
drwxr-xr-x    3 root     root          4096 Jul 21 10:00 ..
lrwxrwxrwx    1 root     root            12 Jul 21 10:00 libcuda.so -> libcuda.so.1
lrwxrwxrwx    1 root     root            21 Jul 21 10:00 libcuda.so.1 -> libcuda.so.570.133.20
-rw-r--r--    1 root     root      71365624 Apr 13 04:46 libcuda.so.570.133.20
lrwxrwxrwx    1 root     root            20 Jul 21 10:00 libcudadebugger.so -> libcudadebugger.so.1
lrwxrwxrwx    1 root     root            29 Jul 21 10:00 libcudadebugger.so.1 -> libcudadebugger.so.570.133.20
-rw-r--r--    1 root     root      10244640 Apr 13 04:17 libcudadebugger.so.570.133.20
lrwxrwxrwx    1 root     root            17 Jul 21 10:00 libnvidia-ml.so -> libnvidia-ml.so.1
lrwxrwxrwx    1 root     root            26 Jul 21 10:00 libnvidia-ml.so.1 -> libnvidia-ml.so.570.133.20
-rw-r--r--    1 root     root       2217912 Apr 13 04:23 libnvidia-ml.so.570.133.20
lrwxrwxrwx    1 root     root            19 Jul 21 10:00 libnvidia-nvvm.so -> libnvidia-nvvm.so.4
lrwxrwxrwx    1 root     root            28 Jul 21 10:00 libnvidia-nvvm.so.4 -> libnvidia-nvvm.so.570.133.20
-rw-r--r--    1 root     root      81978912 Apr 13 05:03 libnvidia-nvvm.so.570.133.20
lrwxrwxrwx    1 root     root            21 Jul 21 10:00 libnvidia-opencl.so -> libnvidia-opencl.so.1
lrwxrwxrwx    1 root     root            30 Jul 21 10:00 libnvidia-opencl.so.1 -> libnvidia-opencl.so.570.133.20
-rw-r--r--    1 root     root      65758768 Apr 13 04:46 libnvidia-opencl.so.570.133.20
-rw-r--r--    1 root     root         10176 Apr 13 04:21 libnvidia-pkcs11-openssl3.so.570.133.20
-rw-r--r--    1 root     root         10168 Apr 13 04:21 libnvidia-pkcs11.so.570.133.20
lrwxrwxrwx    1 root     root            29 Jul 21 10:00 libnvidia-ptxjitcompiler.so -> libnvidia-ptxjitcompiler.so.1
lrwxrwxrwx    1 root     root            38 Jul 21 10:00 libnvidia-ptxjitcompiler.so.1 -> libnvidia-ptxjitcompiler.so.570.133.20
-rw-r--r--    1 root     root      38251952 Apr 13 04:33 libnvidia-ptxjitcompiler.so.570.133.20

And:

$ docker run --rm -ti --runtime=nvidia --gpus all ubuntu ldconfig -p | grep libcuda
        libcudadebugger.so.1 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcudadebugger.so.1
        libcudadebugger.so (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcudadebugger.so
        libcuda.so.1 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcuda.so.1
        libcuda.so (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcuda.so

showing that the .so and .so.1 entries are present in the ldcache.

@elezar elezar self-assigned this Feb 13, 2024
@elezar elezar force-pushed the CNT-4766/create-so-symlinks branch 4 times, most recently from e20826f to 7b5f6b3 Compare April 3, 2024 15:10
@elezar elezar marked this pull request as ready for review April 3, 2024 15:10
@elezar elezar requested review from klueska and cdesiniotis April 3, 2024 15:11
Copy link
Contributor

@klueska klueska left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial pass. I lost some steam near the end, but wanted to drop the comments I had for now.

Copy link
Member Author

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some of the issues were because this was out of date.

@elezar elezar force-pushed the CNT-4766/create-so-symlinks branch 3 times, most recently from 93753a9 to 020f598 Compare April 15, 2024 14:31
@elezar elezar requested a review from klueska May 15, 2024 12:32
@ArangoGutierrez
Copy link
Collaborator

@elezar does #906 supersede this one?

@elezar
Copy link
Member Author

elezar commented Feb 25, 2025

@elezar does #906 supersede this one?

No. This is not related to the CUDA Forward Compat libraries at all. This is about creating .so -> SONAME symlinks for driver libraries in the container. #906 is about ensuring that the CUDA Forward compat libraries are properly detectable in a container if required.

@elezar elezar marked this pull request as draft June 18, 2025 21:42
@elezar elezar force-pushed the CNT-4766/create-so-symlinks branch from 020f598 to 81a6b00 Compare July 17, 2025 09:51
@coveralls
Copy link

coveralls commented Jul 17, 2025

Pull Request Test Coverage Report for Build 16448697637

Details

  • 48 of 70 (68.57%) changed or added relevant lines in 2 files are covered.
  • 16 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.3%) to 35.304%

Changes Missing Coverage Covered Lines Changed/Added Lines %
internal/discover/symlinks.go 47 69 68.12%
Files with Coverage Reduction New Missed Lines %
internal/ldconfig/ldconfig.go 2 0.0%
internal/discover/list.go 14 0.0%
Totals Coverage Status
Change from base Build 16421142187: 0.3%
Covered Lines: 4490
Relevant Lines: 12718

💛 - Coveralls

@elezar elezar force-pushed the CNT-4766/create-so-symlinks branch from 81a6b00 to 3a1f96c Compare July 21, 2025 10:01
@elezar elezar requested a review from klueska July 21, 2025 10:12
@elezar elezar marked this pull request as ready for review July 21, 2025 10:12
@elezar elezar added this to the v1.18.0 milestone Jul 21, 2025
@elezar elezar force-pushed the CNT-4766/create-so-symlinks branch from 20abfcc to 3fea29c Compare July 21, 2025 10:22
Copilot

This comment was marked as outdated.

Comment on lines 153 to 173
for _, soname := range sonames {
if soname == libraryName {
continue
}
linkPath := filepath.Join(filepath.Dir(libraryPath), soname)
if sonameLinkPath == "" {
sonameLinkPath = linkPath
}
s := Symlink{
target: libraryName,
link: linkPath,
}
soSymlinks = append(soSymlinks, s.String())
}

if sonameLinkPath != "" {
sonameLinkPathExt := filepath.Ext(sonameLinkPath)
soLinkPath := strings.TrimSuffix(sonameLinkPath, sonameLinkPathExt)
s := Symlink{
target: filepath.Base(sonameLinkPath),
link: soLinkPath,
}
soSymlinks = append(soSymlinks, s.String())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you walk through all sonames, and add a symlik for them.

And at the same time you see if there was at least one soname that matched, and if there was, you store it in sonameLinkPath so that you can strip off the final version number from it and just additionally include the base .so symlink.

Is that what is going on here?

Copy link
Contributor

@klueska klueska Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'm confused as to why there is a loop over multiple sonames to begin with -- I thought you could only have one per library.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for a given library, we extract the sonames and create symlinks for them (if they don't match the original library name). For the first soname we additionally strip off the numbered suffix and create a .so symlink too.

One thing that we may want to do is explicitly match on *.so instead of making assumptions about the soname that we're linking to.

Copy link
Contributor

@klueska klueska Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meaning, would it make more sense to implement this as:

	sonames, err := lib.DynString(elf.DT_SONAME)
	if err != nil {
		return nil, err
	}

	if len(sonames) != 1 {
		return fmt.Errorf(...)
	}

	// Create the <library>.so.<version> symlink from SONAME
	linkPath := filepath.Join(filepath.Dir(libraryPath), soname[0])
	if soname[0] != libraryName {
		s := Symlink{
			target: libraryName,
			link:   linkPath,
		}
		soSymlinks = append(soSymlinks, s.String())
	}
	
	// Create the <library>.so symlink from the <library>.so.<version> symlink
	soLinkPath := strings.TrimSuffix(linkPath, filepath.Ext(linkPath))
	s := Symlink{
		target: filepath.Base(linkPath),
		link:   soLinkPath,
	}
	soSymlinks = append(soSymlinks, s.String())

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this part:

For the first soname we additionally strip off the numbered suffix and create a .so symlink too

Is it actually possible to have multiple SONAMEs set?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The elf.h manpage states:

          DT_SONAME
                 String table offset to name of shared object

Which refers to a singular. The issue is that the DynString function returns a slice to handle other case. I would therefore assume that it's unexpected if we have MULTIPLE sonames. Let me update the implementation to check the length specifically.

@elezar elezar force-pushed the CNT-4766/create-so-symlinks branch 5 times, most recently from 4fa386e to b26d422 Compare July 21, 2025 14:13
@ArangoGutierrez ArangoGutierrez requested a review from Copilot July 21, 2025 14:44
Copilot

This comment was marked as outdated.

Copy link
Collaborator

@ArangoGutierrez ArangoGutierrez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add an E2E test case inspired by the PR description? that would add a lot of value to this PR

@elezar elezar force-pushed the CNT-4766/create-so-symlinks branch from b26d422 to 309ec50 Compare July 22, 2025 09:41
@elezar
Copy link
Member Author

elezar commented Jul 22, 2025

docker run --rm -ti --runtime=nvidia --gpus all busybox ls -al /usr/lib/x86_64-linux-gnu/

Added in latest.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ArangoGutierrez ArangoGutierrez requested a review from Copilot July 22, 2025 10:04
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR replaces the runtime SONAME symlink creation hook with compile-time .so and SONAME symlink creation directly in the container toolkit. The change eliminates the need for the create-soname-symlinks hook by explicitly creating proper library symlinks during driver library injection.

Key Changes:

  • Added direct .so and SONAME symlink creation during driver library discovery
  • Removed the create-soname-symlinks hook and associated infrastructure
  • Enhanced test coverage to verify proper symlink chain creation

Reviewed Changes

Copilot reviewed 14 out of 16 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
internal/discover/symlinks.go Added getDotSoSymlinks function with ELF parsing to create proper .so/.soname symlinks
internal/discover/symlinks_test.go Added comprehensive test coverage for new symlink creation logic
tests/e2e/nvidia-container-toolkit_test.go Enhanced e2e tests to validate symlink chains and moved driver version parsing to BeforeAll
internal/discover/ldconfig.go Removed create-soname-symlinks hook from ldconfig discovery
cmd/nvidia-cdi-hook/create-soname-symlinks/soname-symlinks.go Removed entire soname symlinks hook implementation
Multiple test files Updated expected test outputs to remove soname symlinks hook

for _, link := range d.getLinksForMount(mount.Path) {
linksForMount := d.getLinksForMount(mount.Path)
soSymlinks, err := d.getDotSoSymlinks(mount.HostPath)
if err != nil {
Copy link
Preview

Copilot AI Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error from getDotSoSymlinks is silently ignored by setting soSymlinks to nil. Consider logging this error or propagating it, as ELF parsing failures could indicate corrupted libraries or other issues that should be visible to users.

Suggested change
if err != nil {
if err != nil {
fmt.Printf("Warning: failed to get .so symlinks for path %s: %v\n", mount.HostPath, err)

Copilot uses AI. Check for mistakes.

parts := strings.SplitN(line, " ", 2)
chain = append(chain, parts...)
if len(parts) == 1 {
Expect(line).To(HaveSuffix(hostDriverMajor))
Copy link
Preview

Copilot AI Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test logic assumes that all library files end with the driver major version, but some libraries like libnvidia-pkcs11.so.570.133.20 may not follow this pattern consistently. Consider making this assertion more specific to libraries that are expected to follow this naming convention.

Suggested change
Expect(line).To(HaveSuffix(hostDriverMajor))
// Define a list of library prefixes expected to follow the naming convention
expectedLibraries := []string{"libnvidia-ptxjitcompiler.so", "libnvidia-ml.so", "libcuda.so"}
isExpectedLibrary := false
for _, prefix := range expectedLibraries {
if strings.HasPrefix(line, prefix) {
isExpectedLibrary = true
break
}
}
// Apply the assertion only if the library matches an expected pattern
if isExpectedLibrary {
Expect(line).To(HaveSuffix(hostDriverMajor))
}

Copilot uses AI. Check for mistakes.

chain = append(chain, parts...)
if len(parts) == 1 {
Expect(line).To(HaveSuffix(hostDriverMajor))
Expect(chain).To(Or(HaveLen(5), HaveLen(1)))
Copy link
Preview

Copilot AI Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The magic numbers 5 and 1 for chain length validation are not clearly explained. Consider adding comments to clarify why these specific lengths are expected or define them as named constants.

Suggested change
Expect(chain).To(Or(HaveLen(5), HaveLen(1)))
// Validate the symlink chain length. A valid chain can either be a single element
// (indicating no symlink) or a full chain of 5 elements as described below.
Expect(chain).To(Or(HaveLen(FullSymlinkChainLength), HaveLen(SingleElementChainLength)))

Copilot uses AI. Check for mistakes.

return "", nil
}
if len(sonames) != 1 {
return "", fmt.Errorf("multiple SONAMEs detected for %v: %v", libraryPath, sonames)
Copy link
Preview

Copilot AI Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message uses %v for both libraryPath (string) and sonames (slice). Consider using %s for libraryPath and keeping %v for sonames for better clarity.

Suggested change
return "", fmt.Errorf("multiple SONAMEs detected for %v: %v", libraryPath, sonames)
return "", fmt.Errorf("multiple SONAMEs detected for %s: %v", libraryPath, sonames)

Copilot uses AI. Check for mistakes.

@elezar elezar force-pushed the CNT-4766/create-so-symlinks branch 4 times, most recently from 4fe001d to 2c671f4 Compare July 22, 2025 14:42
elezar added 4 commits July 22, 2025 17:26
This change ensures that .so and SONAME symlinks are created for
driver libraries in the container.

Signed-off-by: Evan Lezar <[email protected]>
This change removes the create-soname-symlinks hook introduced
in v1.18.0-rc.1. Instead we rely on explicitly creating the
.so -> SONAME -> .so.RM_VERSION symlink chain through the
create-symlink hook.

Signed-off-by: Evan Lezar <[email protected]>
@elezar elezar force-pushed the CNT-4766/create-so-symlinks branch from 2c671f4 to 0e95497 Compare July 22, 2025 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants