Skip to content

fix(tests): prevent tests from producing real Kafka messages to taskbroker#107729

Merged
dashed merged 1 commit intomasterfrom
fix/prevent-test-kafka-task-production
Feb 6, 2026
Merged

fix(tests): prevent tests from producing real Kafka messages to taskbroker#107729
dashed merged 1 commit intomasterfrom
fix/prevent-test-kafka-task-production

Conversation

@dashed
Copy link
Member

@dashed dashed commented Feb 5, 2026

Summary

Tests produce real Kafka messages via the taskworker pipeline, causing stale tasks to accumulate in the taskbroker's SQLite queue. Running 20 tests produced 266 tasks (relay config invalidations, Slack notifications, code owners updates, spike projections). Over multiple test sessions, these accumulate into thousands.

Root Cause

Three factors combine to produce real Kafka messages during tests:

  1. simulate_on_commit (autouse fixture) fires on_commit callbacks during tests
  2. Django signal handlers queue tasks via task.delay() on model post_save
  3. TASKWORKER_ALWAYS_EAGER=False sends those tasks to real Kafka at 127.0.0.1:9092

Fix

Patch TaskNamespace.send_task as a no-op at session start via pytest_sessionstart. This is surgical — it blocks only the Kafka production step while preserving:

  • _signal_send hooks (BurstTaskRunner, stale_database_reads)
  • Serialization validation (create_activation() still runs)
  • TaskRunner (uses ALWAYS_EAGER, bypasses send_task entirely)
  • BurstTaskRunner (captures at _signal_send, before send_task)

The 4 tests in test_registry.py that directly test send_task behavior use a real_send_task fixture to restore the original method.

Verification

Ran tests/getsentry/tasks/test_quota_exceeded_notification.py (20 tests) before and after, with taskbroker running but no taskworker draining the queue:

Metric Before Fix After Fix
Tests passing 20/20 20/20
Baseline tasks (devserver scheduler) 7 8
Tasks after test run 273 (7 + 266 from tests) 8 (0 from tests)
Test-originated tasks 266 0

Before-fix breakdown (266 test-originated tasks from 20 tests):

  • 151 invalidate_project_config
  • 45 update_code_owners_schema
  • 45 new_organization_notify (Slack)
  • 25 run_spike_projection

Additional regression tests all pass:

  • tests/sentry/taskworker/test_registry.py — 15/15 (including 4 with real_send_task fixture)
  • tests/sentry/taskworker/ (full suite) — 177/177
  • tests/getsentry/consumers/test_outcomes_consumer.py — 80/80 (BurstTaskRunner works correctly)

Test plan

  • Verified 266 tasks accumulate in taskbroker from 20 tests (before fix)
  • Verified 0 test-originated tasks after fix
  • All 177 taskworker tests pass
  • BurstTaskRunner tests pass (80/80)
  • Pre-commit hooks pass

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Feb 5, 2026
@dashed dashed self-assigned this Feb 5, 2026
@dashed dashed force-pushed the fix/prevent-test-kafka-task-production branch from fd51300 to 94d6151 Compare February 5, 2026 22:36
@dashed dashed force-pushed the fix/prevent-test-kafka-task-production branch from 94d6151 to b3852a1 Compare February 5, 2026 22:48
Comment on lines +339 to +348
def pytest_sessionstart(session: pytest.Session) -> None:
from sentry.taskworker.registry import TaskNamespace

# Store original send_task so tests that need it can restore it
TaskNamespace._original_send_task = TaskNamespace.send_task # type: ignore[attr-defined]

# Prevent tests from producing real Kafka messages via the taskworker pipeline.
# Tests use TaskRunner (TASKWORKER_ALWAYS_EAGER=True) or BurstTaskRunner
# (_signal_send hook) which both operate before send_task in the call chain.
TaskNamespace.send_task = lambda self, *args, **kwargs: None # type: ignore[method-assign]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we unwind this monkeypatch with a pytest_sessionfinish?

…roker

Tests were producing real Kafka messages via the taskworker pipeline,
causing stale tasks to accumulate in the taskbroker's SQLite queue.
Running 20 tests produced 266 tasks (relay config invalidations, Slack
notifications, code owners updates, spike projections). Over multiple
test sessions, these accumulate into thousands.

The root cause is a three-factor chain: `simulate_on_commit` fires
`on_commit` callbacks during tests, Django signal handlers queue tasks
via `task.delay()`, and `TASKWORKER_ALWAYS_EAGER=False` sends those
tasks to real Kafka at 127.0.0.1:9092.

Patch `TaskNamespace.send_task` as a no-op at session start. This is
surgical — it blocks only Kafka production while preserving:
- `_signal_send` hooks (BurstTaskRunner, stale_database_reads)
- Serialization validation (`create_activation()` still runs)
- TaskRunner (uses ALWAYS_EAGER, bypasses send_task entirely)
- BurstTaskRunner (captures at `_signal_send`, before send_task)

The 4 tests in test_registry.py that directly test `send_task` behavior
use a `real_send_task` fixture to restore the original method.
@dashed dashed force-pushed the fix/prevent-test-kafka-task-production branch from a9f11bf to 7f87226 Compare February 6, 2026 16:34
@dashed dashed requested a review from markstory February 6, 2026 16:36
@dashed dashed marked this pull request as ready for review February 6, 2026 16:36
@dashed dashed requested a review from a team as a code owner February 6, 2026 16:36
Copy link
Member

@evanh evanh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this Alberto!

@dashed dashed merged commit 4d57aec into master Feb 6, 2026
93 checks passed
@dashed dashed deleted the fix/prevent-test-kafka-task-production branch February 6, 2026 20:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants