Skip to content

fix(code-review): Add cache to dedupe github webhook events#107734

Open
suejung-sentry wants to merge 3 commits intomasterfrom
sshin/dedupe-gh-webhooks
Open

fix(code-review): Add cache to dedupe github webhook events#107734
suejung-sentry wants to merge 3 commits intomasterfrom
sshin/dedupe-gh-webhooks

Conversation

@suejung-sentry
Copy link
Member

@suejung-sentry suejung-sentry commented Feb 5, 2026

This PR handles webhook delivery deduplication by introducing redis idempotency keys for the github webhook id.

GitHub guarantees "at-least-once" delivery so may send duplicate webhooks. We have seen anecdotally that seer can receive multiple requests for a single commit (from a pull_request.synchronize event) within 500 milliseconds of each other (redash).

It's unclear whether GitHub is delivering the webhook twice or something in our control-->regional forwarding queues is causing redelivery. In any case, it seems likely that the same payload is getting processed with the same github webhook id. So use that as the idempotency key.

I considered whether we should go for a lock instead. The downside of that is the lock would release after the function returns, which may happen sooner than the 500 milliseconds we are currently seeing dupes in.

So instead in this PR, we just say any webhook with the same webhook id delivered within the same 20 second window are not replayed and re-forwarded to seer.

I chose 20 second TTL to cover the 500 milliseconds period and any errant github retry+backoff behavior.

Redis should be able to handle this load which is one SET per github webhook that makes it past our "preflight" feature enablement filters (a max hour is around 2,000 code reviews, so say 1 request per second in a peak hour). Also the keys are small with a short TTL.

Closes CW-673

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Feb 5, 2026
@suejung-sentry suejung-sentry changed the title fix(code-review): Add lock to dedupe github webhook events fix(code-review): Add cache to dedupe github webhook events Feb 5, 2026


def _get_webhook_seen_cluster() -> RedisCluster[str] | StrictRedis[str]:
return redis_clusters.get("default")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

used similar pattern as done here


logger = logging.getLogger(__name__)

WEBHOOK_SEEN_TTL_SECONDS = 20
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I picked 20 seconds because per the redash, if there ever is a dupe, they happen within 500 milliseconds of each other. I thought 20 seconds would comfortably cover that and any errant github redelivery behavior

@linear
Copy link

linear bot commented Feb 6, 2026

@suejung-sentry suejung-sentry marked this pull request as ready for review February 6, 2026 03:07
@suejung-sentry suejung-sentry requested a review from a team as a code owner February 6, 2026 03:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant