Prefetch all collection items globally before tests#5043
Prefetch all collection items globally before tests#5043
Conversation
Instead of making 3 GraphQL API calls per collection (~300 total for 100 collections), collect all repo and user references across all collections upfront, deduplicate them, and batch the GraphQL queries. This reduces API round-trips from ~300 to ~10-15 (depending on total unique items), significantly improving test suite performance while maintaining the same validation coverage.
There was a problem hiding this comment.
Pull request overview
This PR aims to reduce GitHub API round-trips in the collections test suite by prefetching and deduplicating all repo/user references across collections, then warming NewOctokit’s caches via batched GraphQL queries before per-collection validations run.
Changes:
- Added
prefetch_all_collection_items!to aggregate/deduplicate all collection items and prefetch them in GraphQL batches. - Updated the “renamed or removed” collections test to call
prefetch_all_collection_items!and removed per-collectioncache_*prewarming calls.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| test/collections_test_helper.rb | Introduces global prefetch helper and batching constant for GraphQL cache warming |
| test/collections_test.rb | Switches from per-collection cache warming to a single prefetch call before validations |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def prefetch_all_collection_items! | ||
| return if @_prefetched | ||
|
|
There was a problem hiding this comment.
@_prefetched is an instance variable on the current Minitest::Spec instance; since Minitest creates a new instance per it, this memoization won’t persist across tests/collections, so the global prefetch will rerun many times (and still issue GraphQL requests each time). Use a shared flag (e.g., a class variable on NewOctokit, a module-level instance variable, or a constant guarded by defined?) so the prefetch truly runs once per test process.
There was a problem hiding this comment.
Fixed in 44f56ef. Replaced @_prefetched with @@global_prefetch_done class variable on NewOctokit, accessed via NewOctokit.global_prefetch_done? and NewOctokit.global_prefetch_done! class methods. This persists across Minitest instances since class variables are shared across all instances of the class.
test/collections_test_helper.rb
Outdated
| # Batch repos in chunks to stay within GraphQL query limits | ||
| all_repos.each_slice(GRAPHQL_BATCH_SIZE) do |batch| | ||
| cache_repos_exist_check!(batch) | ||
| end | ||
|
|
||
| # Batch users in chunks | ||
| all_users.each_slice(GRAPHQL_BATCH_SIZE) do |batch| | ||
| cache_users_exist_check!(batch) | ||
| end | ||
|
|
||
| # Check orgs for users not found | ||
| not_found_users = users_not_found_from(all_users) | ||
| not_found_users.each_slice(GRAPHQL_BATCH_SIZE) do |batch| | ||
| cache_orgs_exist_check!(batch) | ||
| end | ||
|
|
||
| @_prefetched = true |
There was a problem hiding this comment.
The prefetch flag is set unconditionally at the end. If any of the cache_*_exist_check! calls return early (e.g., graphql_query returns nil) or raise partway through, subsequent tests will skip prefetching and fall back to many uncached REST calls. Consider only marking prefetched after all batches succeed, or resetting/avoiding the flag on failure.
| # Batch repos in chunks to stay within GraphQL query limits | |
| all_repos.each_slice(GRAPHQL_BATCH_SIZE) do |batch| | |
| cache_repos_exist_check!(batch) | |
| end | |
| # Batch users in chunks | |
| all_users.each_slice(GRAPHQL_BATCH_SIZE) do |batch| | |
| cache_users_exist_check!(batch) | |
| end | |
| # Check orgs for users not found | |
| not_found_users = users_not_found_from(all_users) | |
| not_found_users.each_slice(GRAPHQL_BATCH_SIZE) do |batch| | |
| cache_orgs_exist_check!(batch) | |
| end | |
| @_prefetched = true | |
| all_prefetch_succeeded = true | |
| # Batch repos in chunks to stay within GraphQL query limits | |
| all_repos.each_slice(GRAPHQL_BATCH_SIZE) do |batch| | |
| all_prefetch_succeeded &&= cache_repos_exist_check!(batch) | |
| end | |
| # Batch users in chunks | |
| all_users.each_slice(GRAPHQL_BATCH_SIZE) do |batch| | |
| all_prefetch_succeeded &&= cache_users_exist_check!(batch) | |
| end | |
| # Check orgs for users not found | |
| not_found_users = users_not_found_from(all_users) | |
| not_found_users.each_slice(GRAPHQL_BATCH_SIZE) do |batch| | |
| all_prefetch_succeeded &&= cache_orgs_exist_check!(batch) | |
| end | |
| @_prefetched = true if all_prefetch_succeeded |
There was a problem hiding this comment.
Fixed in 44f56ef. Wrapped all cache_* calls in a begin/rescue block. NewOctokit.global_prefetch_done! is now only called after ALL batch operations complete successfully. If any call raises, the rescue logs a warning and the flag stays false, so individual tests fall back to their own per-test caching.
Summary
NewOctokitcache globallyChanges
test/collections_test_helper.rb: Addprefetch_all_collection_items!function that globally collects and batches all itemstest/collections_test.rb: Replace per-collectioncache_*calls with singleprefetch_all_collection_items!call (idempotent, runs once)How it works
The
prefetch_all_collection_items!function:cache_repos_exist_check!for each batchcache_users_exist_check!for each batchcache_orgs_exist_check!@_prefetched = trueto prevent re-runningSince
NewOctokituses class-level caches (@@repos,@@users), subsequent per-collection checks hit the cache with zero API calls.