-
Notifications
You must be signed in to change notification settings - Fork 379
Create .so symlinks for driver libraries in container #326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
e20826f
to
7b5f6b3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial pass. I lost some steam near the end, but wanted to drop the comments I had for now.
cmd/nvidia-ctk/hook/create-dot-so-symlinks/create-dot-so-symlinks.go
Outdated
Show resolved
Hide resolved
cmd/nvidia-ctk/hook/create-dot-so-symlinks/create-dot-so-symlinks.go
Outdated
Show resolved
Hide resolved
cmd/nvidia-ctk/hook/create-dot-so-symlinks/create-dot-so-symlinks.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think some of the issues were because this was out of date.
cmd/nvidia-ctk/hook/create-dot-so-symlinks/create-dot-so-symlinks.go
Outdated
Show resolved
Hide resolved
cmd/nvidia-ctk/hook/create-dot-so-symlinks/create-dot-so-symlinks.go
Outdated
Show resolved
Hide resolved
cmd/nvidia-ctk/hook/create-dot-so-symlinks/create-dot-so-symlinks.go
Outdated
Show resolved
Hide resolved
93753a9
to
020f598
Compare
No. This is not related to the CUDA Forward Compat libraries at all. This is about creating |
020f598
to
81a6b00
Compare
Pull Request Test Coverage Report for Build 16448697637Details
💛 - Coveralls |
81a6b00
to
3a1f96c
Compare
20abfcc
to
3fea29c
Compare
internal/discover/symlinks.go
Outdated
for _, soname := range sonames { | ||
if soname == libraryName { | ||
continue | ||
} | ||
linkPath := filepath.Join(filepath.Dir(libraryPath), soname) | ||
if sonameLinkPath == "" { | ||
sonameLinkPath = linkPath | ||
} | ||
s := Symlink{ | ||
target: libraryName, | ||
link: linkPath, | ||
} | ||
soSymlinks = append(soSymlinks, s.String()) | ||
} | ||
|
||
if sonameLinkPath != "" { | ||
sonameLinkPathExt := filepath.Ext(sonameLinkPath) | ||
soLinkPath := strings.TrimSuffix(sonameLinkPath, sonameLinkPathExt) | ||
s := Symlink{ | ||
target: filepath.Base(sonameLinkPath), | ||
link: soLinkPath, | ||
} | ||
soSymlinks = append(soSymlinks, s.String()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you walk through all sonames, and add a symlik for them.
And at the same time you see if there was at least one soname that matched, and if there was, you store it in sonameLinkPath
so that you can strip off the final version number from it and just additionally include the base .so
symlink.
Is that what is going on here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I'm confused as to why there is a loop over multiple sonames to begin with -- I thought you could only have one per library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for a given library, we extract the sonames and create symlinks for them (if they don't match the original library name). For the first soname we additionally strip off the numbered suffix and create a .so
symlink too.
One thing that we may want to do is explicitly match on *.so
instead of making assumptions about the soname that we're linking to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meaning, would it make more sense to implement this as:
sonames, err := lib.DynString(elf.DT_SONAME)
if err != nil {
return nil, err
}
if len(sonames) != 1 {
return fmt.Errorf(...)
}
// Create the <library>.so.<version> symlink from SONAME
linkPath := filepath.Join(filepath.Dir(libraryPath), soname[0])
if soname[0] != libraryName {
s := Symlink{
target: libraryName,
link: linkPath,
}
soSymlinks = append(soSymlinks, s.String())
}
// Create the <library>.so symlink from the <library>.so.<version> symlink
soLinkPath := strings.TrimSuffix(linkPath, filepath.Ext(linkPath))
s := Symlink{
target: filepath.Base(linkPath),
link: soLinkPath,
}
soSymlinks = append(soSymlinks, s.String())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this part:
For the first soname we additionally strip off the numbered suffix and create a .so symlink too
Is it actually possible to have multiple SONAMEs set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The elf.h
manpage states:
DT_SONAME
String table offset to name of shared object
Which refers to a singular. The issue is that the DynString
function returns a slice to handle other case. I would therefore assume that it's unexpected if we have MULTIPLE sonames. Let me update the implementation to check the length specifically.
4fa386e
to
b26d422
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add an E2E test case inspired by the PR description? that would add a lot of value to this PR
b26d422
to
309ec50
Compare
Added in latest. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR replaces the runtime SONAME symlink creation hook with compile-time .so and SONAME symlink creation directly in the container toolkit. The change eliminates the need for the create-soname-symlinks
hook by explicitly creating proper library symlinks during driver library injection.
Key Changes:
- Added direct .so and SONAME symlink creation during driver library discovery
- Removed the
create-soname-symlinks
hook and associated infrastructure - Enhanced test coverage to verify proper symlink chain creation
Reviewed Changes
Copilot reviewed 14 out of 16 changed files in this pull request and generated 4 comments.
Show a summary per file
File | Description |
---|---|
internal/discover/symlinks.go |
Added getDotSoSymlinks function with ELF parsing to create proper .so/.soname symlinks |
internal/discover/symlinks_test.go |
Added comprehensive test coverage for new symlink creation logic |
tests/e2e/nvidia-container-toolkit_test.go |
Enhanced e2e tests to validate symlink chains and moved driver version parsing to BeforeAll |
internal/discover/ldconfig.go |
Removed create-soname-symlinks hook from ldconfig discovery |
cmd/nvidia-cdi-hook/create-soname-symlinks/soname-symlinks.go |
Removed entire soname symlinks hook implementation |
Multiple test files | Updated expected test outputs to remove soname symlinks hook |
for _, link := range d.getLinksForMount(mount.Path) { | ||
linksForMount := d.getLinksForMount(mount.Path) | ||
soSymlinks, err := d.getDotSoSymlinks(mount.HostPath) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error from getDotSoSymlinks is silently ignored by setting soSymlinks to nil. Consider logging this error or propagating it, as ELF parsing failures could indicate corrupted libraries or other issues that should be visible to users.
if err != nil { | |
if err != nil { | |
fmt.Printf("Warning: failed to get .so symlinks for path %s: %v\n", mount.HostPath, err) |
Copilot uses AI. Check for mistakes.
parts := strings.SplitN(line, " ", 2) | ||
chain = append(chain, parts...) | ||
if len(parts) == 1 { | ||
Expect(line).To(HaveSuffix(hostDriverMajor)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test logic assumes that all library files end with the driver major version, but some libraries like libnvidia-pkcs11.so.570.133.20 may not follow this pattern consistently. Consider making this assertion more specific to libraries that are expected to follow this naming convention.
Expect(line).To(HaveSuffix(hostDriverMajor)) | |
// Define a list of library prefixes expected to follow the naming convention | |
expectedLibraries := []string{"libnvidia-ptxjitcompiler.so", "libnvidia-ml.so", "libcuda.so"} | |
isExpectedLibrary := false | |
for _, prefix := range expectedLibraries { | |
if strings.HasPrefix(line, prefix) { | |
isExpectedLibrary = true | |
break | |
} | |
} | |
// Apply the assertion only if the library matches an expected pattern | |
if isExpectedLibrary { | |
Expect(line).To(HaveSuffix(hostDriverMajor)) | |
} |
Copilot uses AI. Check for mistakes.
chain = append(chain, parts...) | ||
if len(parts) == 1 { | ||
Expect(line).To(HaveSuffix(hostDriverMajor)) | ||
Expect(chain).To(Or(HaveLen(5), HaveLen(1))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The magic numbers 5 and 1 for chain length validation are not clearly explained. Consider adding comments to clarify why these specific lengths are expected or define them as named constants.
Expect(chain).To(Or(HaveLen(5), HaveLen(1))) | |
// Validate the symlink chain length. A valid chain can either be a single element | |
// (indicating no symlink) or a full chain of 5 elements as described below. | |
Expect(chain).To(Or(HaveLen(FullSymlinkChainLength), HaveLen(SingleElementChainLength))) |
Copilot uses AI. Check for mistakes.
return "", nil | ||
} | ||
if len(sonames) != 1 { | ||
return "", fmt.Errorf("multiple SONAMEs detected for %v: %v", libraryPath, sonames) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message uses %v for both libraryPath (string) and sonames (slice). Consider using %s for libraryPath and keeping %v for sonames for better clarity.
return "", fmt.Errorf("multiple SONAMEs detected for %v: %v", libraryPath, sonames) | |
return "", fmt.Errorf("multiple SONAMEs detected for %s: %v", libraryPath, sonames) |
Copilot uses AI. Check for mistakes.
Signed-off-by: Evan Lezar <[email protected]>
4fe001d
to
2c671f4
Compare
This change ensures that .so and SONAME symlinks are created for driver libraries in the container. Signed-off-by: Evan Lezar <[email protected]>
This change removes the create-soname-symlinks hook introduced in v1.18.0-rc.1. Instead we rely on explicitly creating the .so -> SONAME -> .so.RM_VERSION symlink chain through the create-symlink hook. Signed-off-by: Evan Lezar <[email protected]>
Signed-off-by: Evan Lezar <[email protected]>
Signed-off-by: Evan Lezar <[email protected]>
2c671f4
to
0e95497
Compare
This change explicitly creates .so and SONAME symlinks for injected driver libraries. This means that the
create-soname-symlinks
hook that was added in #947 is not longer required and is therefore removed.And:
showing that the .so and .so.1 entries are present in the ldcache.