fix(tests): prevent tests from producing real Kafka messages to taskbroker#107729
Merged
fix(tests): prevent tests from producing real Kafka messages to taskbroker#107729
Conversation
fd51300 to
94d6151
Compare
94d6151 to
b3852a1
Compare
markstory
reviewed
Feb 5, 2026
Comment on lines
+339
to
+348
| def pytest_sessionstart(session: pytest.Session) -> None: | ||
| from sentry.taskworker.registry import TaskNamespace | ||
|
|
||
| # Store original send_task so tests that need it can restore it | ||
| TaskNamespace._original_send_task = TaskNamespace.send_task # type: ignore[attr-defined] | ||
|
|
||
| # Prevent tests from producing real Kafka messages via the taskworker pipeline. | ||
| # Tests use TaskRunner (TASKWORKER_ALWAYS_EAGER=True) or BurstTaskRunner | ||
| # (_signal_send hook) which both operate before send_task in the call chain. | ||
| TaskNamespace.send_task = lambda self, *args, **kwargs: None # type: ignore[method-assign] |
Member
There was a problem hiding this comment.
Should we unwind this monkeypatch with a pytest_sessionfinish?
b3852a1 to
a9f11bf
Compare
…roker Tests were producing real Kafka messages via the taskworker pipeline, causing stale tasks to accumulate in the taskbroker's SQLite queue. Running 20 tests produced 266 tasks (relay config invalidations, Slack notifications, code owners updates, spike projections). Over multiple test sessions, these accumulate into thousands. The root cause is a three-factor chain: `simulate_on_commit` fires `on_commit` callbacks during tests, Django signal handlers queue tasks via `task.delay()`, and `TASKWORKER_ALWAYS_EAGER=False` sends those tasks to real Kafka at 127.0.0.1:9092. Patch `TaskNamespace.send_task` as a no-op at session start. This is surgical — it blocks only Kafka production while preserving: - `_signal_send` hooks (BurstTaskRunner, stale_database_reads) - Serialization validation (`create_activation()` still runs) - TaskRunner (uses ALWAYS_EAGER, bypasses send_task entirely) - BurstTaskRunner (captures at `_signal_send`, before send_task) The 4 tests in test_registry.py that directly test `send_task` behavior use a `real_send_task` fixture to restore the original method.
a9f11bf to
7f87226
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Tests produce real Kafka messages via the taskworker pipeline, causing stale tasks to accumulate in the taskbroker's SQLite queue. Running 20 tests produced 266 tasks (relay config invalidations, Slack notifications, code owners updates, spike projections). Over multiple test sessions, these accumulate into thousands.
Root Cause
Three factors combine to produce real Kafka messages during tests:
simulate_on_commit(autouse fixture) fireson_commitcallbacks during teststask.delay()on modelpost_saveTASKWORKER_ALWAYS_EAGER=Falsesends those tasks to real Kafka at127.0.0.1:9092Fix
Patch
TaskNamespace.send_taskas a no-op at session start viapytest_sessionstart. This is surgical — it blocks only the Kafka production step while preserving:_signal_sendhooks (BurstTaskRunner,stale_database_reads)create_activation()still runs)TaskRunner(usesALWAYS_EAGER, bypassessend_taskentirely)BurstTaskRunner(captures at_signal_send, beforesend_task)The 4 tests in
test_registry.pythat directly testsend_taskbehavior use areal_send_taskfixture to restore the original method.Verification
Ran
tests/getsentry/tasks/test_quota_exceeded_notification.py(20 tests) before and after, with taskbroker running but no taskworker draining the queue:Before-fix breakdown (266 test-originated tasks from 20 tests):
invalidate_project_configupdate_code_owners_schemanew_organization_notify(Slack)run_spike_projectionAdditional regression tests all pass:
tests/sentry/taskworker/test_registry.py— 15/15 (including 4 withreal_send_taskfixture)tests/sentry/taskworker/(full suite) — 177/177tests/getsentry/consumers/test_outcomes_consumer.py— 80/80 (BurstTaskRunnerworks correctly)Test plan