Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

other: handle systems with only libnvidia-ml.so.1 #1655

Merged
merged 1 commit into from
Jan 7, 2025

Conversation

al42and
Copy link
Contributor

@al42and al42and commented Jan 4, 2025

Description

Recently, NVIDIA CUDA repository packages started shipping only libnvidia-ml.so.1 file, without libnvidia-ml.so. The upstream nvml-wrapper package has a fix proposed (rust-nvml/nvml-wrapper#63), yet the package is in search of a maintainer at the moment and the PR is not getting merged.

To allow bottom to correctly detect NVIDIA GPUs on Ubuntu with official NVIDIA packages, add a wrapper around Nvml::init to be more persistent in its search for the NVML library.

Note: I don't see a reason why we can't try looking for libnvidia-ml.so.1 on non-Linux systems, thus simplifying the code. But I don't have a non-Linux machine with an NVIDIA GPU around to make sure this is indeed the case and nothing weird happens. The extra platform-specificity is, admittedly, not great for long-term maintenance; but I hope that nvml-wrapper will be updated eventually and this code will be removed.

Issue

N/A

Testing

Code was tested on two machines, Ubuntu 22.04 with NVIDIA driver 565 and Ubuntu 24.04 with NVIDIA driver 560; NVIDIA software installed from official CUDA repos, per https://docs.nvidia.com/cuda/cuda-installation-guide-linux/#ubuntu. In both cases, GPU temperature and RAM usage were not displayed without this change. With this change, the information is displayed and is consistent with nvidia-smi. If libnvidia-ml.so.1 is removed and libnvidia-ml.so is added, things work too (with or without this PR).

If this is a code change, please also indicate which platforms were tested:

  • Windows
  • macOS
  • Linux

Checklist

If relevant, ensure the following have been met:

  • Areas your change affects have been linted using rustfmt (cargo fmt)
  • The change has been tested and doesn't appear to cause any unintended breakage
  • Documentation has been added/updated if needed (README.md, help menu, doc pages, etc.)
  • The pull request passes the provided CI pipeline
  • There are no merge conflicts
  • If relevant, new tests were added (don't worry too much about coverage)

Recently, NVIDIA CUDA repository packages started shipping only
`libnvidia-ml.so.1` file, without `libnvidia-ml.so`. The upstream
`nvml-wrapper` package has a fix proposed
(rust-nvml/nvml-wrapper#63), yet the package is
in search of a maintainer at the moment.

To allow `bottom` to correctly detect NVIDIA GPUs on Ubuntu with
official NVIDIA packages, add a wrapper around `Nvml::init` to be more
persistent in its search for the NVML library.
Copy link

codecov bot commented Jan 4, 2025

Codecov Report

Attention: Patch coverage is 81.25000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 41.30%. Comparing base (dbda1ee) to head (6dd708b).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/data_collection/nvidia.rs 81.25% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1655      +/-   ##
==========================================
+ Coverage   41.29%   41.30%   +0.01%     
==========================================
  Files         109      109              
  Lines       17885    17900      +15     
==========================================
+ Hits         7386     7394       +8     
- Misses      10499    10506       +7     
Flag Coverage Δ
macos-14 37.26% <0.00%> (-0.04%) ⬇️
ubuntu-latest 43.03% <92.85%> (+0.01%) ⬆️
windows-2019 37.18% <0.00%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ClementTsang ClementTsang self-assigned this Jan 5, 2025
@ClementTsang
Copy link
Owner

Looks good, thanks!

@all-contributors please add @al42and for code.

@ClementTsang ClementTsang merged commit 915c25a into ClementTsang:main Jan 7, 2025
37 checks passed
Copy link
Contributor

@ClementTsang

I've put up a pull request to add @al42and! 🎉

@al42and al42and deleted the workaround_libnvidia_ml branch January 7, 2025 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants