test: bump brpc_channel_unittest to size=large to fix CI timeout flakiness#3339
Merged
chenBright merged 1 commit intoJun 13, 2026
Merged
Conversation
brpc_channel_unittest packs dozens of timing-sensitive TEST_F (backup request, retry/backoff, timeouts) into a single binary. On contended CI runners its cumulative real-time waits exceed Bazel's default per-test 300s (size=medium) limit, producing flaky TIMEOUT failures (observed TIMEOUT in 4/5 no-cache runs on a GitHub ubuntu-22.04 runner). Add an optional per_test_size override to the generate_unittests macro and set brpc_channel_unittest to size=large (900s). No test source changes. Sharding was rejected: the binary's TEST_F share fixed loopback endpoints and global state, so parallel shards make a 'connection refused' test observe another shard's live server and fail deterministically. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
brpc_channel_unittestbundles dozens of timing-sensitiveTEST_F(backuprequest, retry/backoff, timeouts, connection-failure) into a single test
binary. Each test does real-time waits (server-side
sleep_us, backup-requesttimers, connection retries). gtest runs them serially in one process, so the
binary's wall time is the sum of all those waits.
On contended CI runners (GitHub-hosted
ubuntu-22.04, ~4 shared vCPU withhypervisor steal) that cumulative time exceeds Bazel's default per-test 300s
limit (
size = "medium"), so the binary intermittently fails withTIMEOUTeven though every assertion would pass given enough time.
Evidence (reproduced on GitHub Actions)
Measured on
ubuntu-22.04,--nocache_test_results:size=medium, 300s), 5 runs under loadsize=large(900s), single runsize=large(900s), 20 serialized no-cache runsThe nominal run is ~92–114s, but under parallel-job contention the same binary
balloons past 300s — a ~3× slowdown that crosses the medium ceiling. Raising the
limit to
large(900s) gives ~8× nominal headroom and absorbs the spike.Bench runs (throwaway branch, not part of this PR):
TIMEOUT 4/5+ rejectedshard_count=4experimentFAILED 20/20:https://github.com/rajvarun77/brpc/actions/runs/27396621709
size=largevalidation (20/20 serialized + 91.7s timing):https://github.com/rajvarun77/brpc/actions/runs/27453397271
Fix
Add an optional
per_test_sizeoverride to thegenerate_unittestsmacro andset
brpc_channel_unittesttosize = "large". No test source changes.Why not shard it?
Sharding (
shard_count) was tried first and rejected: it failsdeterministically (20/20).
brpc_channel_unittest'sTEST_Fshare fixedloopback endpoints and global state, so running shards as parallel processes
makes a "connection should be refused" test
(
ChannelTest.connection_failed_selective) observe another shard's liveserver on the same port and see a successful connection instead of
ECONNREFUSED. The tests are not shard-safe; raising the size limit is the onlysafe lever without rewriting the suite for isolation.
cc @chenBright for review.