NVIDIA Open GPU Kernel Modules Version
595.71.05
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
Operating System and Version
Arch Linux
Kernel Release
7.0.3
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
Hardware: GPU
NVIDIA GeForce RTX 4090 Laptop GPU
Describe the bug
Every so often, usually a couple of minutes after resuming from suspend and unlocking the screen on a GNOME 50.1 system, the display attached to the NVIDIA dGPU will fully freeze.
To Reproduce
- Enter suspend.
- Resume from suspend.
- Browse the web for a few minutes.
Bug Incidence
Sometimes
nvidia-bug-report.log.gz
nvidia-bug-report.log.gz
More Info
I'm running GNOME 50.1 and have a hybrid GPU system where one monitor is attached to the NVIDIA dGPU directly.
Only a forced reboot works when this happens. If music is playing, it will keep on playing, suggesting a freeze in the display stack somewhere. None of these work:
- Dropping to a TTY.
- Replugging the monitor, not even to try and use the internal display.
If SysRQ is enabled, a soft or hard kill don't fix the problem, but they do allow a stacktrace to be logged which seems to point to the NVIDIA driver:
19:57:19 computer kernel: sysrq: Show backtrace of all active CPUs
19:57:19 computer kernel: NMI backtrace for cpu 0
19:57:19 computer kernel: CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G OE 7.0.10-arch1-1 #1 PREEMPT(full) b38726df0ec1c5aec6f05d4eab858505a5944d02
19:57:19 computer kernel: Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
19:57:19 computer kernel: Hardware name: LENOVO 82WS/LNVNB161216, BIOS LPCN61WW 03/21/2025
19:57:19 computer kernel: Call Trace:
19:57:19 computer kernel: <IRQ>
19:57:19 computer kernel: dump_stack_lvl+0x5d/0x80
19:57:19 computer kernel: nmi_cpu_backtrace.cold+0x18/0x73
19:57:19 computer kernel: ? __pfx_nmi_raise_cpu_backtrace+0x10/0x10
19:57:19 computer kernel: nmi_trigger_cpumask_backtrace+0x114/0x140
19:57:19 computer kernel: __handle_sysrq.cold+0x74/0xfe
19:57:19 computer kernel: sysrq_filter+0xcf/0x630
19:57:19 computer kernel: input_handle_events_filter+0x60/0xc0
19:57:19 computer kernel: input_pass_values+0x152/0x180
19:57:19 computer kernel: input_event_dispose+0x187/0x190
19:57:19 computer kernel: ? srso_alias_return_thunk+0x5/0xfbef5
19:57:19 computer kernel: input_handle_event+0x41/0x70
19:57:19 computer kernel: input_event+0x58/0x90
19:57:19 computer kernel: hidinput_report_event+0x37/0x50
19:57:19 computer kernel: hid_report_raw_event+0xe7/0x530
19:57:19 computer kernel: __hid_input_report+0x178/0x240
19:57:19 computer kernel: hid_safe_input_report+0x14/0x20
19:57:19 computer kernel: hid_irq_in+0x1aa/0x1e0
19:57:19 computer kernel: __usb_hcd_giveback_urb+0xa0/0x120
19:57:19 computer kernel: usb_giveback_urb_bh+0xc0/0x140
19:57:19 computer kernel: process_one_work+0x19c/0x3a0
19:57:19 computer kernel: bh_worker+0x1d2/0x1f0
19:57:19 computer kernel: tasklet_hi_action+0x13/0x30
19:57:19 computer kernel: handle_softirqs+0xe8/0x2c0
19:57:19 computer kernel: __irq_exit_rcu+0xc9/0xf0
19:57:19 computer kernel: common_interrupt+0x85/0xa0
19:57:19 computer kernel: </IRQ>
19:57:19 computer kernel: <TASK>
19:57:19 computer kernel: asm_common_interrupt+0x26/0x40
19:57:19 computer kernel: RIP: 0010:cpuidle_enter_state+0xbb/0x440
19:57:19 computer kernel: Code: 00 00 e8 a8 d2 ec fe e8 33 ee ff ff 48 89 c5 0f 1f 44 00 00 31 ff e8 24 23 eb fe 45 84 ff 0f 85 74 01 00 00 fb 0f 1f 44 00 00 <45> 85 f6 0f 88 cb 01 00 00 44 89 f1 48 2b 2c 24 48 6b d1 68 48 89
19:57:19 computer kernel: RSP: 0018:ffffffff96a03e10 EFLAGS: 00000246
19:57:19 computer kernel: RAX: ffff8b8715f67000 RBX: 0000000000000003 RCX: 0000000000000046
19:57:19 computer kernel: RDX: 000032fc614d86a1 RSI: 000032fc614d865b RDI: 0000000000000000
19:57:19 computer kernel: RBP: 000032fc614d86a1 R08: 000032fc614d86a1 R09: ffff8b86ad6340c0
19:57:19 computer kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff8b7f8248c800
19:57:19 computer kernel: R13: ffffffff96c19940 R14: 0000000000000003 R15: 0000000000000000
19:57:19 computer kernel: cpuidle_enter+0x31/0x50
19:57:19 computer kernel: do_idle+0x14b/0x2a0
19:57:19 computer kernel: cpu_startup_entry+0x29/0x30
19:57:19 computer kernel: rest_init+0xcc/0xd0
19:57:19 computer kernel: start_kernel+0xa5b/0xa70
19:57:19 computer kernel: x86_64_start_reservations+0x24/0x30
19:57:19 computer kernel: x86_64_start_kernel+0xda/0xe0
19:57:19 computer kernel: common_startup_64+0x13e/0x141
19:57:19 computer kernel: </TASK>
19:57:19 computer kernel: Sending NMI from CPU 0 to CPUs 1-31:
19:57:19 computer kernel: NMI backtrace for cpu 1 skipped: idling at io_idle+0x3/0x30
19:57:19 computer kernel: NMI backtrace for cpu 3
19:57:19 computer kernel: CPU: 3 UID: 0 PID: 490 Comm: nvidia-modeset/ Tainted: G OE 7.0.10-arch1-1 #1 PREEMPT(full) b38726df0ec1c5aec6f05d4eab858505a5944d02
19:57:19 computer kernel: Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
19:57:19 computer kernel: Hardware name: LENOVO 82WS/LNVNB161216, BIOS LPCN61WW 03/21/2025
19:57:19 computer kernel: RIP: 0010:nvWriteGpEntry+0xc6/0x3e0 [nvidia_modeset]
19:57:19 computer kernel: Code: 48 f2 44 01 f2 39 f2 0f 4d f2 0f 4d f8 83 c1 01 41 39 c9 75 cf 39 fd 75 7b f6 03 08 75 90 48 8b 83 80 01 00 00 66 83 78 0e ff <0f> 85 74 ff ff ff 8b 70 08 49 8b 83 48 01 00 00 48 89 df 48 8b 40
19:57:19 computer kernel: RSP: 0018:ffffce520195fd28 EFLAGS: 00000213
19:57:19 computer kernel: RAX: ffff8b7fb861c000 RBX: ffff8b7fad347830 RCX: 00000000000003ac
19:57:19 computer kernel: RDX: ffff8b7fb8607000 RSI: 00000000000003ac RDI: ffff8b7fad347830
19:57:19 computer kernel: RBP: 0000000000000004 R08: 0000000000000002 R09: 0000000000000001
19:57:19 computer kernel: R10: 0000000000000010 R11: ffffce5200b710d8 R12: 0000000000000000
19:57:19 computer kernel: R13: 00000000000003ac R14: ffff8b7fad347830 R15: 0000000000000000
19:57:19 computer kernel: FS: 0000000000000000(0000) GS:ffff8b8716027000(0000) knlGS:0000000000000000
19:57:19 computer kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
19:57:19 computer kernel: CR2: 00007f4f54dffce8 CR3: 000000061e224000 CR4: 0000000000f50ef0
19:57:19 computer kernel: PKRU: 55555554
19:57:19 computer kernel: Call Trace:
19:57:19 computer kernel: <TASK>
19:57:19 computer kernel: ? __schedule+0x45b/0x1720
19:57:19 computer kernel: ? nvPushSetObject+0xb1/0x170 [nvidia_modeset 69381f9f3a403d5f8b611a3fcf365473e5861115]
19:57:19 computer kernel: nvPushKickoff+0x2c/0x50 [nvidia_modeset 69381f9f3a403d5f8b611a3fcf365473e5861115]
19:57:19 computer kernel: PrefetchHelperSurfaceEvo+0x4df/0x700 [nvidia_modeset 69381f9f3a403d5f8b611a3fcf365473e5861115]
19:57:19 computer kernel: ? srso_alias_return_thunk+0x5/0xfbef5
19:57:19 computer kernel: nvDIFRPrefetchSurfaces+0xd9/0x240 [nvidia_modeset 69381f9f3a403d5f8b611a3fcf365473e5861115]
19:57:19 computer kernel: ? __pfx__main_loop+0x10/0x10 [nvidia_modeset 69381f9f3a403d5f8b611a3fcf365473e5861115]
19:57:19 computer kernel: DifrPrefetchEventDeferredWork+0x16/0x30 [nvidia_modeset 69381f9f3a403d5f8b611a3fcf365473e5861115]
19:57:19 computer kernel: nvkms_kthread_q_callback+0xe6/0x170 [nvidia_modeset 69381f9f3a403d5f8b611a3fcf365473e5861115]
19:57:19 computer kernel: _main_loop+0x10e/0x160 [nvidia_modeset 69381f9f3a403d5f8b611a3fcf365473e5861115]
19:57:19 computer kernel: ? srso_alias_return_thunk+0x5/0xfbef5
19:57:19 computer kernel: ? srso_alias_return_thunk+0x5/0xfbef5
19:57:19 computer kernel: kthread+0xe1/0x120
19:57:19 computer kernel: ? __pfx_kthread+0x10/0x10
19:57:19 computer kernel: ret_from_fork+0x2bc/0x350
19:57:19 computer kernel: ? __pfx_kthread+0x10/0x10
19:57:19 computer kernel: ret_from_fork_asm+0x1a/0x30
19:57:19 computer kernel: </TASK>
19:57:19 computer kernel: NMI backtrace for cpu 2 skipped: idling at io_idle+0x3/0x30
19:57:19 computer kernel: NMI backtrace for cpu 14 skipped: idling at io_idle+0x3/0x30
19:57:19 computer kernel: NMI backtrace for cpu 28 skipped: idling at io_idle+0x3/0x30
19:57:19 computer kernel: NMI backtrace for cpu 29 skipped: idling at io_idle+0x3/0x30
19:57:19 computer kernel: NMI backtrace for cpu 26 skipped: idling at io_idle+0x3/0x30
19:57:19 computer kernel: NMI backtrace for cpu 18 skipped: idling at io_idle+0x3/0x30
19:57:19 computer kernel: NMI backtrace for cpu 17 skipped: idling at io_idle+0x3/0x30
19:57:19 computer kernel: NMI backtrace for cpu 9 skipped: idling at io_idle+0x3/0x30
19:57:19 computer kernel: NMI backtrace for cpu 11 skipped: idling at io_idle+0x3/0x30
19:57:19 computer kernel: NMI backtrace for cpu 8 skipped: idling at io_idle+0x3/0x30
19:57:19 computer kernel: NMI backtrace for cpu 7 skipped: idling at io_idle+0x3/0x30
19:57:19 computer kernel: NMI backtrace for cpu 10 skipped: idling at io_idle+0x3/0x30
19:57:19 computer kernel: NMI backtrace for cpu 15 skipped: idling at io_idle+0x3/0x30
19:57:19 computer kernel: NMI backtrace for cpu 13 skipped: idling at io_idle+0x3/0x30
19:57:19 computer kernel: NMI backtrace for cpu 4 skipped: idling at io_idle+0x3/0x30
19:57:19 computer kernel: NMI backtrace for cpu 12 skipped: idling at io_idle+0x3/0x30
19:57:19 computer kernel: NMI backtrace for cpu 5 skipped: idling at io_idle+0x3/0x30
19:57:19 computer kernel: NMI backtrace for cpu 6 skipped: idling at io_idle+0x3/0x30
19:57:19 computer kernel: NMI backtrace for cpu 30 skipped: idling at io_idle+0x3/0x30
19:57:19 computer kernel: NMI backtrace for cpu 27 skipped: idling at io_idle+0x3/0x30
19:57:19 computer kernel: NMI backtrace for cpu 23 skipped: idling at io_idle+0x3/0x30
19:57:19 computer kernel: NMI backtrace for cpu 19 skipped: idling at io_idle+0x3/0x30
19:57:19 computer kernel: NMI backtrace for cpu 31 skipped: idling at io_idle+0x3/0x30
19:57:19 computer kernel: NMI backtrace for cpu 25 skipped: idling at io_idle+0x3/0x30
19:57:19 computer kernel: NMI backtrace for cpu 24 skipped: idling at io_idle+0x3/0x30
19:57:19 computer kernel: NMI backtrace for cpu 22 skipped: idling at io_idle+0x3/0x30
19:57:19 computer kernel: NMI backtrace for cpu 16 skipped: idling at io_idle+0x3/0x30
19:57:19 computer kernel: NMI backtrace for cpu 20 skipped: idling at io_idle+0x3/0x30
19:57:19 computer kernel: NMI backtrace for cpu 21 skipped: idling at io_idle+0x3/0x30
19:57:27 computer kernel: sysrq: Emergency Remount R/O
19:57:27 computer kernel: Emergency Remount complete
A SysRQ reboot does work to avoid a hard reboot.
Possibly related to #1064 as #1064 (comment) in particular contains a similar stacktrace. In any case, whilst I have had the same heartbeat messages in my logs, they haven't occurred since at least a month.
NVIDIA Open GPU Kernel Modules Version
595.71.05
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
Operating System and Version
Arch Linux
Kernel Release
7.0.3
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
Hardware: GPU
NVIDIA GeForce RTX 4090 Laptop GPU
Describe the bug
Every so often, usually a couple of minutes after resuming from suspend and unlocking the screen on a GNOME 50.1 system, the display attached to the NVIDIA dGPU will fully freeze.
To Reproduce
Bug Incidence
Sometimes
nvidia-bug-report.log.gz
nvidia-bug-report.log.gz
More Info
I'm running GNOME 50.1 and have a hybrid GPU system where one monitor is attached to the NVIDIA dGPU directly.
Only a forced reboot works when this happens. If music is playing, it will keep on playing, suggesting a freeze in the display stack somewhere. None of these work:
If SysRQ is enabled, a soft or hard kill don't fix the problem, but they do allow a stacktrace to be logged which seems to point to the NVIDIA driver:
A SysRQ reboot does work to avoid a hard reboot.
Possibly related to #1064 as #1064 (comment) in particular contains a similar stacktrace. In any case, whilst I have had the same heartbeat messages in my logs, they haven't occurred since at least a month.