SG-42069 Fix random segfault by controlling Qt object destruction order#178
SG-42069 Fix random segfault by controlling Qt object destruction order#178julien-lang merged 23 commits intomasterfrom
Conversation
…Down The real bug was that ExternalConfigurationLoader (a QObject with signal connections) was being destroyed without first disconnecting its signals. When Qt tried to auto-disconnect during object destruction, the connected objects (bg_task_manager, _shotgun_state) were already partially destroyed, causing segmentation faults. This is a PySide6-specific issue that became more frequent with newer PySide6 versions (6.8.x) used in Python 3.13 tests. The fix explicitly disconnects all signal connections before setting the object to None, preventing Qt from accessing partially-destroyed objects during cleanup.
The tearDown was attempting to disconnect() on _MockedSignal objects which don't have that method. Now checking hasattr() for 'disconnect' before attempting to use it, so mocked signals are skipped but real Qt signals are properly disconnected to prevent segfaults.
This fixes a potential segfault that could occur in production when ExternalConfigurationLoader is destroyed without explicitly disconnecting Qt signals first. This especially affects Python 3.13 + PySide6 6.8.3+. Changes: - Added explicit signal disconnection in shut_down() method - Added debug logging to track signal disconnection - Updated test tearDown() to also include debug logging This ensures both test and production code properly clean up Qt signals to prevent crashes during object destruction.
Using standard master branch of tk-ci-tools instead of custom branch. This keeps the PR diff focused only on the signal disconnection fixes.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #178 +/- ##
=======================================
Coverage 61.22% 61.22%
=======================================
Files 50 50
Lines 3412 3412
=======================================
Hits 2089 2089
Misses 1323 1323
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
The shut_down() method needs to check if signals have a disconnect method before attempting to call it. Mocked signals in tests don't have this attribute, causing AttributeError. This ensures the method works correctly in both production (real Qt signals) and test environments (mocked signals).
Reverts production code changes and keeps fix only in test tearDown(). This addresses CI test segfaults without touching production since we have no evidence of customer-facing issues. The fix disconnects Qt signals before shutdown to prevent segfaults with PySide6 6.8.3+ where Qt auto-disconnects from partially-destroyed QObjects. Added comprehensive comment noting this may indicate a production bug, but keeping fix test-only to minimize risk until we have actual evidence of production crashes.
The tearDown was checking hasattr on attributes that might not exist: - bg_task_manager.task_failed (only task_completed is mocked) - external_config_loader._shotgun_state Now checks if parent object HAS the attribute before checking disconnect: - hasattr(bg_task_manager, 'task_failed') before accessing it - hasattr(external_config_loader, '_shotgun_state') before accessing it This matches the working solution from commit 37f1b44.
The hasattr check was insufficient because _MockedSignal has Mock objects for emit/connect, and Mock.__getattr__ makes hasattr return True for any attribute including 'disconnect'. Now explicitly checks isinstance to skip _MockedSignal objects before attempting disconnect, ensuring we only disconnect real Qt signals.
The 2-second timeout was too aggressive for macOS Python 3.13 where threading/GIL behavior has changed. Background task manager needs more time to process on this platform/version combination. Actual test run showed >2.27 seconds needed on macOS Python 3.13. Increased timeout from 2 to 10 seconds to match realistic performance expectations across all supported platforms. Other similar background task tests use 10-60 second timeouts.
Python 3.13 changed threading/GIL behavior which can cause background tasks to take longer on some platforms (especially macOS). Increase timeout from 10s to 30s for Python 3.13+ to accommodate this variance while keeping the shorter timeout for earlier versions.
There was a problem hiding this comment.
Pull request overview
Stabilizes CI tests by avoiding Qt/PySide6 teardown segfaults through explicit reference release ordering, and reduces flakiness in a schema-caching test by adjusting the timeout for Python 3.13+.
Changes:
- Added Python-version-dependent timeout for
test_cached_schemato account for slower background work on Python 3.13+. - Updated test
tearDownto release Qt object references before callingsuper().tearDown()to avoid PySide6 segfaults.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| tests/shotgun_model/test_cached_schema.py | Adjusts test timeout logic based on Python version to reduce flakiness. |
| tests/external_config/init.py | Changes teardown order to release Qt references before parent teardown to avoid segfaults. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| before = time.time() | ||
|
|
||
| timeout = 5 if sys.version_info >= (3, 13) else 2 | ||
| # Python 3.13+ has different threading/GIL behavior that can make background |
Fixes random CI test segfaults by controlling object destruction order in test tearDown.
Scope: Test code only - no production changes.
Root Cause: PySide6 6.8.3+ attempts to auto-disconnect signals from partially-destroyed QObjects, accessing freed memory and causing segfaults during test cleanup.
Solution: Set instance variables to None before calling super().tearDown(). This releases Qt object references in a controlled order, allowing proper cleanup before parent tearDown runs.
Additional Fix: Increased timeout in test_cached_schema from 2s to 5s for Python 3.13+ due to different threading/GIL behavior affecting background task completion times.
Why Test-Only: We have no customer reports or evidence of production crashes. This fix addresses CI instability while minimizing risk by avoiding changes to production code.