RTX 5070 (GB205): Xid 32 → Xid 56 → "Wait for channel idle timed out" → full hard-freeze under sustained display+compute load — 595.71.05 open module, X11, Resizable BAR enabled

## Summary

On an RTX 5070 (GB205) running the **595.71.05 open kernel modules** on **X11**, the GPU's **display path** intermittently faults with `NVRM Xid 32` and/or `Xid 56` originating from desktop graphics clients (browser, gnome-shell). On 2026-06-04 the fault escalated into a **complete, unrecoverable system freeze** requiring a hard power-cycle (no clean shutdown, no kernel oops). The compute path (CUDA, via Ollama) was active throughout and never faulted.

This appears to be the same **display-engine instability class** as forum thread [365287](https://forums.developer.nvidia.com/t/bug-report/365287) and issues #1132 / #1134, but with a **different configuration and trigger** that none of those cover:

- vs. **#1132 / #1134**: those are **Resizable BAR _disabled_ / BAR1 undersized** (`__nv_drm_gem_nvkms_map` range crossing the BAR1↔BAR3 boundary). Here **Resizable BAR is enabled** (BAR1 = 16 GiB), so the BAR-overflow mechanism does not apply.
- vs. **forum 365287**: identical symptom cascade and an **identical `Xid 56, CMDre 00000007` fingerprint**, but that report's root cause was *GPU state lost across an S3 resume with `nvidia-suspend`/`nvidia-resume` disabled*. Here those services are **enabled** and **no suspend/resume occurred** before the crash — the machine was awake under continuous load.

## Environment

| | |
|---|---|
| GPU | NVIDIA GeForce RTX 5070 (GB205), VBIOS `98.05.36.00.83`, PCI `0000:01:00.0` |
| Driver | **595.71.05** open kernel modules (Ubuntu `nvidia-driver-595-open` 595.71.05-0ubuntu0.24.04.1, DKMS) |
| OS | Ubuntu 24.04.4 LTS |
| Kernel | 6.17.0-35-generic |
| CPU / Board | AMD Ryzen 9 7950X / Gigabyte B650M AORUS ELITE AX, BIOS F21 |
| Display server | **X11** (Xorg + GNOME), `nvidia_drm.modeset=1` |
| Resizable BAR | **Enabled** — BAR0 64M, **BAR1 16G**, BAR3 32M |
| GSP firmware | Enabled (`NVreg_EnableGpuFirmware`) |
| Power mgmt | `PreserveVideoMemoryAllocations=1`; `nvidia-persistenced` active; `nvidia-suspend/resume/hibernate.service` all **enabled** |
| Concurrent compute | Ollama (CUDA) serving embeddings continuously on the same GPU |

## Symptom / cascade

The GPU drives both the desktop **and** a CUDA compute workload (Ollama). All faults originate from **display/graphics clients**; the compute workload never appears in any Xid.

### Event A — 2026-06-04 — fatal (hard freeze)

```
09:33:49  kernel: NVRM: Xid (PCI:0000:01:00): 32, pid=13307, name=brave, channel 0x00000004 intr0 00040000
09:33:49  kernel: NVRM: Xid (PCI:0000:01:00): 32, pid=13307, name=brave, channel 0x00000004 intr0 00040000
09:33:50  brave: ERROR ...shared_context_state.cc:1317] SharedContextState context lost via EXT_robustness.
                 Reset status = GL_GUILTY_CONTEXT_RESET_KHR
09:33:50  brave: GPU process exited unexpectedly: exit_code=8704
          ... ~6 min of normal operation ...
09:39:59  kernel: NVRM: Xid (PCI:0000:01:00): 32, pid=9397, name=gnome-shell, channel 0x00000007 intr 02000000
09:40:04  gdm-x-session: (WW) NVIDIA: Wait for channel idle timed out.
09:40:11  gdm-x-session: (WW) NVIDIA: Wait for channel idle timed out.
09:40:52  <last log line; machine frozen — required hard power-cycle>
```

The first Xid 32 (browser channel) was survivable — Chromium reset its GPU context and continued. The second Xid 32, on **gnome-shell's** channel, was not: the X driver then blocked on "Wait for channel idle timed out" and the entire machine became unresponsive within ~50 s. The system journal ends mid-stream with **no shutdown sequence**, and `/proc/sys/kernel/tainted` shows only out-of-tree+unsigned module bits (no `TAINT_DIE`) — i.e. a hard hang, not a logged panic.

### Event B — 2026-06-01 — non-fatal (recovered), same week

```
09:54:19  kernel: NVRM: Xid (PCI:0000:01:00): 56, CMDre 00000007 00000404 ffffffff 00000004 00800000
09:54:19  kernel: NVRM: Xid (PCI:0000:01:00): 56, CMDre 00000007 00000000 00000000 00000001 00800000
09:54:24  gdm-x-session: (WW) NVIDIA: Wait for channel idle timed out.
```

Here the display engine threw `Xid 56` (the `CMDre 00000007` fingerprint matches forum 365287's fatal event exactly) plus the same channel-idle timeout, but the X driver recovered and the session stayed up. This suggests the same underlying display-engine fault can either recover (Xid 56, Event B) or wedge the whole machine (Xid 32 on the compositor channel, Event A).

## Ruled out

- **S3 suspend/resume corruption** (the 365287 root cause): `nvidia-suspend/resume/hibernate` are enabled and **no suspend occurred** before the crash — uptime was continuous under load.
- **Thermal / power**: no HW/SW thermal or power-brake slowdown; idle/light temps (~58 °C), 39 W of 250 W.
- **CPU MCE**: none.
- **OOM**: no oom-killer events.
- **PCIe**: no AER / corrected errors / link drops for `0000:01:00`.
- **VRAM/ECC**: consumer card (ECC N/A); no remapped rows / channel-repair pending.
- **Kernel panic**: none recorded (taint = out-of-tree+unsigned module only).

## What seems to trigger it

Sustained **mixed load**: the 5070 simultaneously drives an X11 desktop with many GPU-accelerated clients (Chromium-based browser, gnome-shell, Electron apps) **and** a continuous CUDA workload (Ollama embeddings). Faults appear under this concurrency without any suspend, gaming/Proton, or BAR-undersize condition.

## Request

- Confirm whether the `Xid 32 → Xid 56 (CMDre 00000007) → channel-idle timeout` display-engine failure on GB205 / 595.71.05 is tracked, and whether it is distinct from the BAR-undersize (#1132/#1134) and S3-resume (forum 365287) cases given Resizable BAR is **enabled** and no suspend is involved here.
- Guidance on diagnostics to capture next time it degrades (the freeze is total, so `nvidia-bug-report.sh` can't run post-hang — happy to capture it live at first Xid if there's a recommended trigger).

I can attach a full `nvidia-bug-report.sh` log and complete `journalctl -b -1` from the crashed boot on request.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RTX 5070 (GB205): Xid 32 → Xid 56 → "Wait for channel idle timed out" → full hard-freeze under sustained display+compute load — 595.71.05 open module, X11, Resizable BAR enabled #1179

Summary

Environment

Symptom / cascade

Event A — 2026-06-04 — fatal (hard freeze)

Event B — 2026-06-01 — non-fatal (recovered), same week

Ruled out

What seems to trigger it

Request

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development


GPU	NVIDIA GeForce RTX 5070 (GB205), VBIOS `98.05.36.00.83`, PCI `0000:01:00.0`
Driver	595.71.05 open kernel modules (Ubuntu `nvidia-driver-595-open` 595.71.05-0ubuntu0.24.04.1, DKMS)
OS	Ubuntu 24.04.4 LTS
Kernel	6.17.0-35-generic
CPU / Board	AMD Ryzen 9 7950X / Gigabyte B650M AORUS ELITE AX, BIOS F21
Display server	X11 (Xorg + GNOME), `nvidia_drm.modeset=1`
Resizable BAR	Enabled — BAR0 64M, BAR1 16G, BAR3 32M
GSP firmware	Enabled (`NVreg_EnableGpuFirmware`)
Power mgmt	`PreserveVideoMemoryAllocations=1`; `nvidia-persistenced` active; `nvidia-suspend/resume/hibernate.service` all enabled
Concurrent compute	Ollama (CUDA) serving embeddings continuously on the same GPU

RTX 5070 (GB205): Xid 32 → Xid 56 → "Wait for channel idle timed out" → full hard-freeze under sustained display+compute load — 595.71.05 open module, X11, Resizable BAR enabled #1179

Description

Summary

Environment

Symptom / cascade

Event A — 2026-06-04 — fatal (hard freeze)

Event B — 2026-06-01 — non-fatal (recovered), same week

Ruled out

What seems to trigger it

Request

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions