-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Fix GPU discovery script to make it run with mdev for SR-IOV enabled devices #11340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes GPU discovery functionality to properly handle MDEV devices with SR-IOV enabled, particularly for vGPU configurations with A10 GPUs. The changes improve device discovery, XML generation, and UI handling for GPU devices.
- Refactored GPU discovery script to better handle MDEV instances on both Physical Functions and Virtual Functions
- Updated GPU device type assignment logic to properly categorize VGPUOnly and passthrough devices
- Simplified GPU summary UI by removing unnecessary filtering and improving device type handling
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
File | Description |
---|---|
gpudiscovery.sh | Refactored MDEV discovery with reusable function and added SR-IOV VF support |
GpuServiceImpl.java | Improved GPU device type assignment and parent device handling |
GPUSummaryTab.vue | Simplified card summary logic and excluded VGPUOnly devices from calculations |
LibvirtGpuDef.java | Added model='vfio-pci' attribute for MDEV device XML generation |
plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/LibvirtGpuDef.java
Show resolved
Hide resolved
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #11340 +/- ##
=========================================
Coverage 17.17% 17.17%
- Complexity 14993 14994 +1
=========================================
Files 5869 5869
Lines 521728 521732 +4
Branches 63506 63506
=========================================
+ Hits 89604 89606 +2
- Misses 422053 422055 +2
Partials 10071 10071
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clgtm
ebdf22a
to
f9deffd
Compare
@blueorangutan package |
@vishesh92 a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 14466 |
@blueorangutan test |
@sureshanaparti a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests |
@vishesh92 Do we have the facilities to test this? cc @sureshanaparti |
I am sharing an environment with 2 nvidia A10 with @rosi-shapeblue for testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Tested all workflows, found no underlying issues.
[SF] Trillian test result (tid-13977)
|
Description
Continuation of #11143
Thanks to Swen from ProIO, I was able to get access to a server with Nvidia A10 GPUs.
This PR contains fixes especially in the discovery for mdev devices with SR-IOV enabled. vGPU with A10 has been tested successfully with these changes.
Generated Summary
This pull request introduces several updates to improve GPU device handling, enhance vGPU discovery, and simplify the UI logic for GPU summaries. The changes include updates to the KVM hypervisor plugin, GPU discovery scripts, backend GPU service logic, and the GPU summary UI component.
Updates to GPU Device Handling and vGPU Discovery:
KVM Hypervisor Plugin Enhancements:
LibvirtGpuDef.java
to include themodel='vfio-pci'
attribute in the XML for MDEV devices, ensuring compatibility with vfio-pci-based configurations.GPU Discovery Script Improvements:
gpudiscovery.sh
to introduce a reusableprocess_mdev_instances
function for handling MDEV instances, consolidating logic and improving maintainability. [1] [2]source/address
element.Backend Logic Enhancements:
GpuServiceImpl.java
, ensuring proper assignment ofVGPUOnly
and passthrough types based on device capabilities. [1] [2]UI Simplifications:
GPUSummaryTab.vue
by removing filters that excluded passthrough profiles from card summaries, ensuring all devices are considered. [1] [2]VGPUOnly
from the summary calculations to focus on relevant device types.Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Screenshots (if appropriate):
How Has This Been Tested?
How did you try to break this feature and the system with this change?