Summary
A userspace GL/WebGL client (Chromium-based browser rendering a WebGL-heavy page) can exhaust the GPU's 256 MiB BAR1 aperture, after which the driver emits a continuous flood of dmaAllocMapping_GM107: can't alloc VA space for mapping / NV_ERR_NO_MEMORY, then krcWatchdog: GPU is probably locked!, and the entire machine hard-locks — no clean shutdown, no SysRq, requires a power cycle.
Expected behavior: BAR1/VA-space exhaustion should surface to the client as an allocation failure (the renderer/tab dies — which it does on the first occurrence), without locking the GPU engine or hanging the host kernel. Observed behavior: on a repeat of the workload the driver fails to contain the exhaustion and the GPU/host deadlock.
This is reproducible and not load-spike related — it builds over a few minutes while the page is open.
Environment
- GPU: NVIDIA GeForce RTX 2070 SUPER (Turing, TU104)
- VBIOS: 90.04.95.00.58
- BAR1: 256 MiB (Resizable BAR not supported on Turing, so this is fixed)
- Driver: 595.71.05 (open kernel modules). Also reproduced on 580.159.03.
- Kernel: 6.17.0-35-generic, Ubuntu 24.04 (x86_64)
- CPU/board: AMD Ryzen 7 3700X, Gigabyte (AMD platform)
Reproduction
- Open a WebGL/canvas-heavy site (in our case ui.com / UniFi UI) in a GPU-accelerated Chromium-based browser.
- Leave it rendering for ~1–7 minutes.
- BAR1 VA-space exhausts; kernel log fills with
can't alloc VA space; renderer crashes once ("Aw, Snap").
- Reload the page → GPU RC watchdog reports the GPU locked → full system hang (hard reset required).
Reproduced 3/3 times. Disabling browser GPU acceleration avoids it (confirms the BAR1 mapping path as the trigger).
Key kernel log sequence (excerpt)
NVRM: dmaAllocMapping_GM107: can't alloc VA space for mapping. (×hundreds, in bursts)
NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051)
... @ mapping_reuse.c:273
NVRM: ... @ kern_bus_gm107.c:3141
[drm] [nvidia-drm] [GPU ID 0x00000700] Failed to ioremap_wc NvKmsKapiMemory ...
[drm:__nv_drm_gem_nvkms_map [nvidia_drm]] *ERROR* Failed to map NvKmsKapiMemory ...
NVRM: krcWatchdog_IMPL: RC watchdog: GPU is probably locked! Notify Timeout Seconds: 7
NVRM: nvAssertFailedNoLog: Assertion failed: GPPut < WATCHDOG_GPFIFO_ENTRIES @ kernel_rc_watchdog.c:1549
(Full curated kernel sequence and nvidia-bug-report.log.gz available on request / attached.)
Notes / impact
- The single-renderer-crash path works (allocation failure is returned). The escalation to a GPU engine lock + unrecoverable host hang on repeat is the bug.
- On a small-BAR1 (256 MiB) Turing part with no Resizable BAR, this aperture is easy for a modern WebGL workload to exhaust, so robust handling of BAR1 exhaustion matters here.
Summary
A userspace GL/WebGL client (Chromium-based browser rendering a WebGL-heavy page) can exhaust the GPU's 256 MiB BAR1 aperture, after which the driver emits a continuous flood of
dmaAllocMapping_GM107: can't alloc VA space for mapping/NV_ERR_NO_MEMORY, thenkrcWatchdog: GPU is probably locked!, and the entire machine hard-locks — no clean shutdown, no SysRq, requires a power cycle.Expected behavior: BAR1/VA-space exhaustion should surface to the client as an allocation failure (the renderer/tab dies — which it does on the first occurrence), without locking the GPU engine or hanging the host kernel. Observed behavior: on a repeat of the workload the driver fails to contain the exhaustion and the GPU/host deadlock.
This is reproducible and not load-spike related — it builds over a few minutes while the page is open.
Environment
Reproduction
can't alloc VA space; renderer crashes once ("Aw, Snap").Reproduced 3/3 times. Disabling browser GPU acceleration avoids it (confirms the BAR1 mapping path as the trigger).
Key kernel log sequence (excerpt)
(Full curated kernel sequence and
nvidia-bug-report.log.gzavailable on request / attached.)Notes / impact