[d3d12/vk] Implement out of memory detection #7472

teoxoy · 2025-04-03T16:52:14Z

Connections
Resolves D3D12 and Vulkan parts of #7460.

Description
Implements OOM detection. See #7460 for details.

Testing
I added a new crate oom-test for this.

Squash or Rebase?
Rebase

teoxoy · 2025-04-03T16:58:00Z

wgpu-hal/src/vulkan/device.rs

@@ -2435,6 +2519,10 @@ impl crate::Device for super::Device {
        &self,
        desc: &wgt::QuerySetDescriptor<crate::Label>,
    ) -> Result<super::QuerySet, crate::DeviceError> {
+        // Assume each query is 256 bytes.
+        // On an AMD W6800 with driver version 32.0.12030.9, occlusion queries are 256.
+        self.check_for_oom(true, desc.count as u64 * 256)?;


I'm not 100% sure about this, there doesn't seem to be any indication in the Vulkan API that query sets reside in a host visible heap (even if this is the case on the AMD card I tested).

I was actually surprised they are in the host visible heap rather than the device local one (as they are on D3D12).

…and acceleration structures The D3D12 API doesn't guarantee that it returns `E_OUTOFMEMORY` in high memory pressure situations; drivers/kernel will happily start swapping objects that were in VRAM to RAM and then RAM to DISK, slowing down the system to a crawl if done in a loop.

…re, query set and acceleration structure creation

… if we are over 95% of our budget

This removes the possibility of deadlocks happening since `release_gpu_resources` tries to lock resources (trackers, snatchable_lock, pending_writes, life_tracker) while they might be already locked; `handle_hal_error` is called in lots of places. Removing the call only delays destruction since `release_gpu_resources` is still called in `maintain`.

This is to preserve the current behavior as tested by the `SAMPLER_CREATION_FAILURE` test. This is not spec compliant but it's unclear what we should do instead. I opened gpuweb/gpuweb#5142 to figure out what we should do.

… acceleration structures

teoxoy requested a review from a team as a code owner April 3, 2025 16:52

teoxoy force-pushed the oom-detection branch from 5e68b4b to 02ec94b Compare April 3, 2025 16:55

teoxoy commented Apr 3, 2025

View reviewed changes

ErichDonGubler mentioned this pull request Apr 4, 2025

[d3d12] driver crash while allocating GPU memory #5288

Open

teoxoy force-pushed the oom-detection branch 2 times, most recently from 4b29a2d to 7dd2aa9 Compare April 7, 2025 18:00

teoxoy added 7 commits April 8, 2025 15:43

add device validity checks to Queue methods

248dcaf

invalidate Device on OOM errors with the exception of buffer, textu…

ef20fbc

…re, query set and acceleration structure creation

[D3D12/VK] add OOM check on submit and poll that will lose the device…

342ab62

… if we are over 95% of our budget

allow sampler creation to return OOMs

7fa1a46

This is to preserve the current behavior as tested by the `SAMPLER_CREATION_FAILURE` test. This is not spec compliant but it's unclear what we should do instead. I opened gpuweb/gpuweb#5142 to figure out what we should do.

[vk] add OOM checks before creating buffers, textures, query sets and…

c1a442d

… acceleration structures

teoxoy force-pushed the oom-detection branch from 7dd2aa9 to c1a442d Compare April 8, 2025 13:43

cwfitzgerald self-assigned this Apr 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[d3d12/vk] Implement out of memory detection #7472

[d3d12/vk] Implement out of memory detection #7472

teoxoy commented Apr 3, 2025

teoxoy Apr 3, 2025

teoxoy Apr 3, 2025

[d3d12/vk] Implement out of memory detection #7472

Are you sure you want to change the base?

[d3d12/vk] Implement out of memory detection #7472

Conversation

teoxoy commented Apr 3, 2025

teoxoy Apr 3, 2025

Choose a reason for hiding this comment

teoxoy Apr 3, 2025

Choose a reason for hiding this comment