Skip to content

Ensure that libcuda.so is in the ldcache #947

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions cmd/nvidia-cdi-hook/chmod/chmod.go
Original file line number Diff line number Diff line change
Expand Up @@ -113,15 +113,15 @@ func (m command) run(c *cli.Context, cfg *config) error {
return fmt.Errorf("failed to load container state: %v", err)
}

containerRoot, err := s.GetContainerRoot()
containerRoot, err := s.GetContainerRootDirPath()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️ Thanks for adding clarity here.

And now that I have read your explanation.. this is a path that is valid in the host filesystem, and points to the container filesystem's root..? :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It is the absolute path on the host where the root of the container's filesystem is located.

if err != nil {
return fmt.Errorf("failed to determined container root: %v", err)
}
if containerRoot == "" {
return fmt.Errorf("empty container root detected")
}

paths := m.getPaths(containerRoot, cfg.paths.Value(), cfg.mode)
paths := m.getPaths(string(containerRoot), cfg.paths.Value(), cfg.mode)
if len(paths) == 0 {
m.logger.Debugf("No paths specified; exiting")
return nil
Expand All @@ -140,6 +140,7 @@ func (m command) run(c *cli.Context, cfg *config) error {
}

// getPaths updates the specified paths relative to the root.
// TODO(elezar): This function should be updated to make use of the oci.ContainerRoot type.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:)

func (m command) getPaths(root string, paths []string, desiredMode fs.FileMode) []string {
var pathsInRoot []string
for _, f := range paths {
Expand Down
2 changes: 2 additions & 0 deletions cmd/nvidia-cdi-hook/commands/commands.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ import (
"github.com/urfave/cli/v2"

"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-cdi-hook/chmod"
soname "github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-cdi-hook/create-soname-symlinks"
symlinks "github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-cdi-hook/create-symlinks"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-cdi-hook/cudacompat"
ldcache "github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-cdi-hook/update-ldcache"
Expand All @@ -34,5 +35,6 @@ func New(logger logger.Interface) []*cli.Command {
symlinks.NewCommand(logger),
chmod.NewCommand(logger),
cudacompat.NewCommand(logger),
soname.NewCommand(logger),
}
}
132 changes: 132 additions & 0 deletions cmd/nvidia-cdi-hook/create-soname-symlinks/soname-symlinks.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
/**
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/

package soname

import (
"errors"
"fmt"
"path/filepath"

"github.com/urfave/cli/v2"

"github.com/NVIDIA/nvidia-container-toolkit/internal/config"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
"github.com/NVIDIA/nvidia-container-toolkit/internal/oci"
safeexec "github.com/NVIDIA/nvidia-container-toolkit/internal/safe-exec"
)

type command struct {
logger logger.Interface
safeexec.Execer
}

type options struct {
folders cli.StringSlice
ldconfigPath string
containerSpec string
}

// NewCommand constructs an create-soname-symlinks command with the specified logger
func NewCommand(logger logger.Interface) *cli.Command {
c := command{
logger: logger,
Execer: safeexec.New(logger),
}
return c.build()
}

// build the create-soname-symlinks command
func (m command) build() *cli.Command {
cfg := options{}

// Create the 'create-soname-symlinks' command
c := cli.Command{
Name: "create-soname-symlinks",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@klueska I elected to add a new hook entirely instead of modifying the existing update-ldcache. This is in keeping with "purpose-built hooks" and also means that the hook name can be used to indicate the intent.

Usage: "Create soname symlinks for the specified folders using ldconfig -n -N",
Before: func(c *cli.Context) error {
return m.validateFlags(c, &cfg)
},
Action: func(c *cli.Context) error {
return m.run(c, &cfg)
},
}

c.Flags = []cli.Flag{
&cli.StringSliceFlag{
Name: "folder",
Usage: "Specify a folder to search for shared libraries for which soname symlinks need to be created",
Destination: &cfg.folders,
},
&cli.StringFlag{
Name: "ldconfig-path",
Usage: "Specify the path to the ldconfig program",
Destination: &cfg.ldconfigPath,
Value: "/sbin/ldconfig",
},
&cli.StringFlag{
Name: "container-spec",
Usage: "Specify the path to the OCI container spec. If empty or '-' the spec will be read from STDIN",
Destination: &cfg.containerSpec,
},
}

return &c
}

func (m command) validateFlags(c *cli.Context, cfg *options) error {
if cfg.ldconfigPath == "" {
return errors.New("ldconfig-path must be specified")
}
return nil
}

func (m command) run(c *cli.Context, cfg *options) error {
s, err := oci.LoadContainerState(cfg.containerSpec)
if err != nil {
return fmt.Errorf("failed to load container state: %v", err)
}

containerRoot, err := s.GetContainerRootDirPath()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below I see we sometimes have a nice OnHost suffix. Could this be applicable here, too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean in the for the fuction or the variable name? (or both).

if err != nil {
return fmt.Errorf("failed to determined container root: %v", err)
}
if containerRoot == "" {
m.logger.Warningf("No container root detected")
Copy link

@jgehrcke jgehrcke Mar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when can/does this happen in practice?
(for my own curiosity/understanding)

Copy link
Member Author

@elezar elezar Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be that someone has launched the container runtime with a malformed container specification or that there is a mismatch between what we expect and (future) OCI runtime specification versions. This is more of a failsafe for an unlikely event than something we generally expect.

Adding this check here also means that the rest of the code can continue with the assumption that containerRoot is not empty.

return nil
}

dirs := cfg.folders.Value()
if len(dirs) == 0 {
return nil
}

ldconfigPath := config.ResolveLDConfigPathOnHost(cfg.ldconfigPath)
args := []string{filepath.Base(ldconfigPath)}

args = append(args,
// Specify the containerRoot to use.
"-r", string(containerRoot),
// Specify -n to only process the specified folders.
"-n",
// Explicitly disable updating the LDCache.
"-N",
)
// Explicitly specific the directories to add.
args = append(args, dirs...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question -- does this create .so symlinks for all libraries present in the specified directories? Does this differ from the behavior of the legacy libnvidia-container implementation, which IIRC would only create the .so symlinks for a small list of libraries (like libcuda.so)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the .so symlinks, but the SONAME symlinks i.e. libcuda.so.1 -> libcuda.so.RM_VERSION in the case of libcuda. The .so symlinks are created using the "standard" create-symlinks hook.


return m.Exec(ldconfigPath, args, nil)
}
4 changes: 2 additions & 2 deletions cmd/nvidia-cdi-hook/create-symlinks/create-symlinks.go
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ func (m command) run(c *cli.Context, cfg *config) error {
return fmt.Errorf("failed to load container state: %v", err)
}

containerRoot, err := s.GetContainerRoot()
containerRoot, err := s.GetContainerRootDirPath()
if err != nil {
return fmt.Errorf("failed to determined container root: %v", err)
}
Expand All @@ -100,7 +100,7 @@ func (m command) run(c *cli.Context, cfg *config) error {
return fmt.Errorf("invalid symlink specification %v", l)
}

err := m.createLink(containerRoot, parts[0], parts[1])
err := m.createLink(string(containerRoot), parts[0], parts[1])
if err != nil {
return fmt.Errorf("failed to create link %v: %w", parts, err)
}
Expand Down
76 changes: 0 additions & 76 deletions cmd/nvidia-cdi-hook/cudacompat/container-root.go

This file was deleted.

Loading