Skip to content

SG-42069 Fix random segfault by controlling Qt object destruction order#178

Merged
julien-lang merged 23 commits intomasterfrom
ticket/SG-38851-try-fixup-seg-fault
Feb 10, 2026
Merged

SG-42069 Fix random segfault by controlling Qt object destruction order#178
julien-lang merged 23 commits intomasterfrom
ticket/SG-38851-try-fixup-seg-fault

Conversation

@julien-lang
Copy link
Member

@julien-lang julien-lang commented Feb 10, 2026

Fixes random CI test segfaults by controlling object destruction order in test tearDown.

Scope: Test code only - no production changes.

Root Cause: PySide6 6.8.3+ attempts to auto-disconnect signals from partially-destroyed QObjects, accessing freed memory and causing segfaults during test cleanup.

Solution: Set instance variables to None before calling super().tearDown(). This releases Qt object references in a controlled order, allowing proper cleanup before parent tearDown runs.

Additional Fix: Increased timeout in test_cached_schema from 2s to 5s for Python 3.13+ due to different threading/GIL behavior affecting background task completion times.

Why Test-Only: We have no customer reports or evidence of production crashes. This fix addresses CI instability while minimizing risk by avoiding changes to production code.

…Down

The real bug was that ExternalConfigurationLoader (a QObject with signal
connections) was being destroyed without first disconnecting its signals.
When Qt tried to auto-disconnect during object destruction, the connected
objects (bg_task_manager, _shotgun_state) were already partially destroyed,
causing segmentation faults.

This is a PySide6-specific issue that became more frequent with newer
PySide6 versions (6.8.x) used in Python 3.13 tests.

The fix explicitly disconnects all signal connections before setting
the object to None, preventing Qt from accessing partially-destroyed
objects during cleanup.
The tearDown was attempting to disconnect() on _MockedSignal objects
which don't have that method. Now checking hasattr() for 'disconnect'
before attempting to use it, so mocked signals are skipped but real
Qt signals are properly disconnected to prevent segfaults.
This fixes a potential segfault that could occur in production when
ExternalConfigurationLoader is destroyed without explicitly disconnecting
Qt signals first. This especially affects Python 3.13 + PySide6 6.8.3+.

Changes:
- Added explicit signal disconnection in shut_down() method
- Added debug logging to track signal disconnection
- Updated test tearDown() to also include debug logging

This ensures both test and production code properly clean up Qt signals
to prevent crashes during object destruction.
Using standard master branch of tk-ci-tools instead of custom branch.
This keeps the PR diff focused only on the signal disconnection fixes.
@codecov
Copy link

codecov bot commented Feb 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 61.22%. Comparing base (bf6805c) to head (f7a105c).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #178   +/-   ##
=======================================
  Coverage   61.22%   61.22%           
=======================================
  Files          50       50           
  Lines        3412     3412           
=======================================
  Hits         2089     2089           
  Misses       1323     1323           
Flag Coverage Δ
Linux 61.13% <ø> (ø)
Python-3.10 61.24% <ø> (ø)
Python-3.11 61.21% <ø> (-0.03%) ⬇️
Python-3.13 61.24% <ø> (+0.02%) ⬆️
Python-3.9 61.24% <ø> (ø)
Windows 61.16% <ø> (ø)
macOS 61.16% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

The shut_down() method needs to check if signals have a disconnect
method before attempting to call it. Mocked signals in tests don't
have this attribute, causing AttributeError.

This ensures the method works correctly in both production (real Qt
signals) and test environments (mocked signals).
Reverts production code changes and keeps fix only in test tearDown().
This addresses CI test segfaults without touching production since we
have no evidence of customer-facing issues.

The fix disconnects Qt signals before shutdown to prevent segfaults
with PySide6 6.8.3+ where Qt auto-disconnects from partially-destroyed
QObjects. Added comprehensive comment noting this may indicate a
production bug, but keeping fix test-only to minimize risk until we
have actual evidence of production crashes.
The tearDown was checking hasattr on attributes that might not exist:
- bg_task_manager.task_failed (only task_completed is mocked)
- external_config_loader._shotgun_state

Now checks if parent object HAS the attribute before checking disconnect:
- hasattr(bg_task_manager, 'task_failed') before accessing it
- hasattr(external_config_loader, '_shotgun_state') before accessing it

This matches the working solution from commit 37f1b44.
The hasattr check was insufficient because _MockedSignal has Mock
objects for emit/connect, and Mock.__getattr__ makes hasattr return
True for any attribute including 'disconnect'.

Now explicitly checks isinstance to skip _MockedSignal objects before
attempting disconnect, ensuring we only disconnect real Qt signals.
The 2-second timeout was too aggressive for macOS Python 3.13 where
threading/GIL behavior has changed. Background task manager needs more
time to process on this platform/version combination.

Actual test run showed >2.27 seconds needed on macOS Python 3.13.
Increased timeout from 2 to 10 seconds to match realistic performance
expectations across all supported platforms.

Other similar background task tests use 10-60 second timeouts.
Python 3.13 changed threading/GIL behavior which can cause background
tasks to take longer on some platforms (especially macOS). Increase
timeout from 10s to 30s for Python 3.13+ to accommodate this variance
while keeping the shorter timeout for earlier versions.
@julien-lang julien-lang changed the title SG-38851: Fix segfault by disconnecting Qt signals SG-38851: Fix random segfault by controlling Qt object destruction order Feb 10, 2026
@julien-lang julien-lang changed the title SG-38851: Fix random segfault by controlling Qt object destruction order SG-38851 Fix random segfault by controlling Qt object destruction order Feb 10, 2026
@julien-lang julien-lang requested a review from Copilot February 10, 2026 17:44
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Stabilizes CI tests by avoiding Qt/PySide6 teardown segfaults through explicit reference release ordering, and reduces flakiness in a schema-caching test by adjusting the timeout for Python 3.13+.

Changes:

  • Added Python-version-dependent timeout for test_cached_schema to account for slower background work on Python 3.13+.
  • Updated test tearDown to release Qt object references before calling super().tearDown() to avoid PySide6 segfaults.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
tests/shotgun_model/test_cached_schema.py Adjusts test timeout logic based on Python version to reduce flakiness.
tests/external_config/init.py Changes teardown order to release Qt references before parent teardown to avoid segfaults.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@julien-lang julien-lang changed the title SG-38851 Fix random segfault by controlling Qt object destruction order SG-42069 Fix random segfault by controlling Qt object destruction order Feb 10, 2026
@julien-lang julien-lang requested a review from a team February 10, 2026 18:09
@julien-lang julien-lang marked this pull request as ready for review February 10, 2026 18:09
before = time.time()

timeout = 5 if sys.version_info >= (3, 13) else 2
# Python 3.13+ has different threading/GIL behavior that can make background
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😍

@julien-lang julien-lang merged commit d99dac9 into master Feb 10, 2026
26 checks passed
@julien-lang julien-lang deleted the ticket/SG-38851-try-fixup-seg-fault branch February 10, 2026 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments