Skip to content

Fix flaky TestQueryFrontendResponseSizeLimit: wait for ring convergence#7490

Merged
friedrichg merged 1 commit intocortexproject:masterfrom
yeya24:fix-flaky-test-56
May 8, 2026
Merged

Fix flaky TestQueryFrontendResponseSizeLimit: wait for ring convergence#7490
friedrichg merged 1 commit intocortexproject:masterfrom
yeya24:fix-flaky-test-56

Conversation

@yeya24
Copy link
Copy Markdown
Contributor

@yeya24 yeya24 commented May 8, 2026

What this PR does:

Wait for the distributor to discover the ingester in the ring before pushing samples in TestQueryFrontendResponseSizeLimit.

Why:

The test was flaky because StartAndWaitReady only ensures the HTTP endpoint is healthy, not that the distributor has discovered the ingester's tokens in the hash ring. This caused intermittent DoBatch: InstancesCount <= 0 errors (HTTP 500 on push).

Fix:

Added distributor.WaitSumMetrics(e2e.Equals(512), "cortex_ring_tokens_total") before the push loop, consistent with other integration tests in the same file (e.g., lines 323, 490, 583).

The test was flaky because it pushed samples before the distributor
discovered the ingester in the ring, causing 'InstancesCount <= 0'
errors. Add WaitSumMetrics on cortex_ring_tokens_total to ensure the
distributor sees the ingester's tokens before pushing.

Signed-off-by: Ben Ye <benye@amazon.com>
Copy link
Copy Markdown
Member

@friedrichg friedrichg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label May 8, 2026
@friedrichg friedrichg merged commit e2f2cd6 into cortexproject:master May 8, 2026
37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm This PR has been approved by a maintainer size/XS type/flaky-test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants