other: handle systems with only libnvidia-ml.so.1 #1655
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Recently, NVIDIA CUDA repository packages started shipping only
libnvidia-ml.so.1
file, withoutlibnvidia-ml.so
. The upstreamnvml-wrapper
package has a fix proposed (rust-nvml/nvml-wrapper#63), yet the package is in search of a maintainer at the moment and the PR is not getting merged.To allow
bottom
to correctly detect NVIDIA GPUs on Ubuntu with official NVIDIA packages, add a wrapper aroundNvml::init
to be more persistent in its search for the NVML library.Note: I don't see a reason why we can't try looking for
libnvidia-ml.so.1
on non-Linux systems, thus simplifying the code. But I don't have a non-Linux machine with an NVIDIA GPU around to make sure this is indeed the case and nothing weird happens. The extra platform-specificity is, admittedly, not great for long-term maintenance; but I hope thatnvml-wrapper
will be updated eventually and this code will be removed.Issue
N/A
Testing
Code was tested on two machines, Ubuntu 22.04 with NVIDIA driver 565 and Ubuntu 24.04 with NVIDIA driver 560; NVIDIA software installed from official CUDA repos, per https://docs.nvidia.com/cuda/cuda-installation-guide-linux/#ubuntu. In both cases, GPU temperature and RAM usage were not displayed without this change. With this change, the information is displayed and is consistent with
nvidia-smi
. Iflibnvidia-ml.so.1
is removed andlibnvidia-ml.so
is added, things work too (with or without this PR).If this is a code change, please also indicate which platforms were tested:
Checklist
If relevant, ensure the following have been met:
cargo fmt
)README.md
, help menu, doc pages, etc.)