Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow nvproxy to be enabled on platform/kvm #11436

Open
nixprime opened this issue Feb 4, 2025 · 0 comments
Open

Allow nvproxy to be enabled on platform/kvm #11436

nixprime opened this issue Feb 4, 2025 · 0 comments
Labels
type: enhancement New feature or request

Comments

@nixprime
Copy link
Member

nixprime commented Feb 4, 2025

Description

Per https://github.com/google/gvisor/blob/master/g3doc/proposals/nvidia_driver_proxy.md, nvproxy is currently incompatible with platform/kvm for two reasons:

  1. platform/kvm configures all guest page table entries to enable write-back caching, which is not generally correct for device mappings. On Intel CPUs, KVM mostly mitigates this by mapping uncacheable and write-combining pages as UC in EPT1; the hardware takes roughly the intersection of the EPT and guest page table memory type2. On AMD CPUs, the hardware has the same capability3, but KVM does not configure it4, so guest page table entires must set memory type correctly to obtain correct behavior. I don't think plumbing memory type through the sentry should be too difficult, but matching the kernel driver's memory types will be tricky; we will probably have to approximate conservatively in some cases, which may degrade performance.

  2. Mappings of any given file offset in /dev/nvidia-uvm must be at the matching virtual address. In KVM, providing mappings of /dev/nvidia-uvm to the guest requires mapping it into the KVM-using process' (the sentry's) address space and then forwarding it into the guest using a KVM memslot. Thus any UVM mapping at an offset that conflicts with an existing sentry mapping is unimplementable. AFAIK, this only affects CUDA - Vulkan and NVENC/DEC do not use UVM - so this should no longer block use of nvproxy with platform/kvm in any case. But it is still a problem for CUDA.

Experimentally:

  • CUDA-using binaries unconditionally map 2 MB of /dev/nvidia-uvm at a fixed address (in cuda_malloc_test this happens to be 0x205000000, but I'm unsure if this is consistent between binaries). This happens to work (at least in my minimal testing) if nvproxy.uvmFDMemmapFile.MapInternal() is changed to attempt a mapping using MAP_FIXED_NOREPLACE.
  • cudaMallocManaged() reserves application address space using mmap(MAP_PRIVATE|MAP_ANONYMOUS) and then overwrites the reservation mapping with a MAP_FIXED mapping of /dev/nvidia-uvm, which sometimes collides with an existing sentry mapping depending on ASLR for both the sentry and application.

Options:

  • Keep the existing implementation of uvmFDMemmapFile.MapInternal(), which will unconditionally cause CUDA binaries to fail on platform/kvm.
  • Try to use MAP_FIXED_NOREPLACE for the sentry mapping. This should cause CUDA binaries that don't use cudaMallocManaged() to succeed (good), but will cause CUDA binaries that do to flake (bad). This is probably unacceptable for use cases that run arbitrary GPU workloads, but might be fine for others; AFAIK use of cudaMallocManaged() is uncommon for performance reasons.
  • Use MAP_FIXED_NOREPLACE, and use a custom ELF loader to load the sentry in an address range that is less likely to collide with application addresses. I'm not sure this would actually work since IIUC the interfering sentry mappings are from runtime mmaps.
  • Use MAP_FIXED_NOREPLACE, and also try to avoid returning application mappings that would collide with existing sentry mappings. This is racy and leaks information about the sentry to applications, so it's probably undesirable.
  • Fail at startup if platform/kvm is in use, nvproxy is enabled, and NVIDIA_DRIVER_CAPABILITIES contains compute. I mention this option mostly to rule it out; some containers in practice specify NVIDIA_DRIVER_CAPABILITIES=all even if only e.g. graphics support is required, and in fact NVIDIA's Vulkan support requires libnvidia-gpucomp.so which libnvidia-container only mounts when --compute is specified5 so this is necessary!

Is this feature related to a specific bug?

No response

Do you have a specific solution in mind?

No response

Footnotes

  1. Linux: arch/x86/kvm/mmu/spte.c:make_spte() => kvm_is_mmio_pfn(), arch/x86/kvm/vmx/vmx.c:vmx_get_mt_mask()

  2. Intel 64 and IA-32 Software Developer Manual, Vol. 3, Sec. 30.3.7.2 "Memory Type Used for Translated Guest-Physical Addresses"

  3. AMD64 Architecture Programmer's Manual, Vol. 2, Sec. 15.25.8 "Combining Memory Types, MTRRs"

  4. SVM does not set shadow_memtype_mask or implement the get_mt_mask static call; see also fc07e76ac7ff ('Revert "KVM: SVM: use NPT page attributes"')

  5. https://github.com/NVIDIA/libnvidia-container/blob/95d3e86522976061e856724867ebcaf75c4e9b60/src/nvc_info.c#L85

@nixprime nixprime added the type: enhancement New feature or request label Feb 4, 2025
copybara-service bot pushed a commit that referenced this issue Feb 6, 2025
Updates #11436

PiperOrigin-RevId: 723723715
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant