nvidia-container-toolkit: Some cuda libraries do not work without extra LD_LIBRARY_PATH

## Describe the bug

I'm running this in a docker image with GPU:

https://gitlab.com/scripta/escriptorium/-/wikis/docker-install

GPU training failed out of the box suggesting `libcuda.so` cannot be loaded:

```
celery-gpu-1           | GPU available: True (cuda), used: True
celery-gpu-1           | TPU available: False, using: 0 TPU cores
celery-gpu-1           | IPU available: False, using: 0 IPUs
celery-gpu-1           | HPU available: False, using: 0 HPUs
celery-gpu-1           | `Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
celery-gpu-1           | You are using a CUDA device ('NVIDIA GeForce RTX 4090') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
celery-gpu-1           | [2024-12-18 09:09:03,469: INFO/ForkPoolWorker-1] Creating new model [1,120,0,1 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 S1(1x0)1,3 Lbx200 Do0.1,2 Lbx200 Do0.1,2 Lbx200 Do] with 77 outputs
celery-gpu-1           | [2024-12-18 09:09:03,680: INFO/ForkPoolWorker-1] Adding 1 dummy labels to validation set codec.
celery-gpu-1           | [2024-12-18 09:09:03,686: INFO/ForkPoolWorker-1] Setting seg_type to baselines.
celery-gpu-1           | LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
celery-gpu-1           | Could not load library libcudnn_cnn_infer.so.8. Error: libcuda.so: cannot open shared object file: No such file or directory
celery-gpu-1           | [2024-12-18 09:09:17,657: ERROR/MainProcess] Process 'ForkPoolWorker-1' pid:221 exited with 'signal 6 (SIGABRT)'
celery-gpu-1           | [2024-12-18 09:09:17,669: ERROR/MainProcess] Task handler raised error: WorkerLostError('Worker exited prematurely: signal 6 (SIGABRT) Job: 0.')
celery-gpu-1           | Traceback (most recent call last):
celery-gpu-1           |   File "/usr/local/lib/python3.8/site-packages/billiard/pool.py", line 1265, in mark_as_worker_lost
celery-gpu-1           |     raise WorkerLostError(
celery-gpu-1           | billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 6 (SIGABRT) Job: 0.
```

Adding `LD_LIBRARY_PATH=/usr/local/nvidia/lib64` to the environment fixes this issue.

I believe this happens because the `/nix/store` path that is in ld.so search path only contains `libcuda.so.1` while `/usr/local/nvidia` also contains `libcuda.so`:

```
# ls -l /usr/local/nvidia/lib64/libcuda.so*
lrwxrwxrwx 1 root root       12 Jan  1  1970 /usr/local/nvidia/lib64/libcuda.so -> libcuda.so.1
lrwxrwxrwx 1 root root       17 Jan  1  1970 /usr/local/nvidia/lib64/libcuda.so.1 -> libcuda.so.565.77
-r-xr-xr-x 1 root root 49572768 Jan  1  1970 /usr/local/nvidia/lib64/libcuda.so.565.77
# cat /etc/ld.so.conf.d/nvcr-3734471176.conf
/nix/store/mvl6kwi86n35pqf601raka1ncp3zkdgy-nvidia-x11-565.77-6.6.64/lib
# ls -l /nix/store/mvl6kwi86n35pqf601raka1ncp3zkdgy-nvidia-x11-565.77-6.6.64/lib/libcuda.so*
lrwxrwxrwx 1 root root       17 Dec 18 09:34 /nix/store/mvl6kwi86n35pqf601raka1ncp3zkdgy-nvidia-x11-565.77-6.6.64/lib/libcuda.so.1 -> libcuda.so.565.77
-r-xr-xr-x 1 root root 49572768 Jan  1  1970 /nix/store/mvl6kwi86n35pqf601raka1ncp3zkdgy-nvidia-x11-565.77-6.6.64/lib/libcuda.so.565.77
```

... while cudnn wants `libcuda.so`.

## Metadata



 - system: `"x86_64-linux"`
 - host os: `Linux 6.6.64, NixOS, 25.05 (Warbler), 25.05.20241213.3566ab7`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.24.10`
 - channels(root): `"nixos"`
 - nixpkgs: `/nix/store/22r7q7s9552gn1vpjigkbhfgcvhsrz68-source`


## Notify maintainers

@SomeoneSerge @ereslibre

Relevant tracking bug: #290609

---

Note for maintainers: Please tag this issue in your PR.

---

Add a :+1: [reaction] to [issues you find important].

[reaction]: https://github.blog/2016-03-10-add-reactions-to-pull-requests-issues-and-comments/
[issues you find important]: https://github.com/NixOS/nixpkgs/issues?q=is%3Aissue+is%3Aopen+sort%3Areactions-%2B1-desc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

nvidia-container-toolkit: Some cuda libraries do not work without extra LD_LIBRARY_PATH #366109

Describe the bug

Metadata

Notify maintainers

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

nvidia-container-toolkit: Some cuda libraries do not work without extra LD_LIBRARY_PATH #366109

Description

Describe the bug

Metadata

Notify maintainers

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions