feat: split router execution configs#2847
Conversation
…lit config poller (#2844)
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughIntroduces split-config loading end-to-end: control plane adds CompositionService and routerConfigHash; repositories and handlers delegate composition/deploy to CompositionService; CDN exposes manifest endpoints gated by JWT feature claim; router gains split-config poller/fetcher and mux reuse on hot reload; JWT tokens include optional features; tests, migration, and docs added. ChangesSplit-config orchestration across controlplane, CDN, and router
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
|
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
There was a problem hiding this comment.
Actionable comments posted: 15
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
controlplane/src/core/bufservices/graph/setGraphRouterCompatibilityVersion.ts (2)
97-111:⚠️ Potential issue | 🟠 Major | ⚡ Quick winThis success path never persists the new compatibility version.
When
subgraphs.length === 0, the handler returnsOKwithnewVersion: version, but the onlyupdateRouterCompatibilityVersion(...)call is below this branch. The API reports a successful change while leaving the stored version untouched.💡 Minimal fix
if (subgraphs.length === 0) { + await fedGraphRepo.updateRouterCompatibilityVersion(federatedGraph.id, version); return { response: { code: EnumStatusCode.OK, details: `The router compatibility version was set to "${req.version}" successfully.`,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@controlplane/src/core/bufservices/graph/setGraphRouterCompatibilityVersion.ts` around lines 97 - 111, When subgraphs.length === 0 in setGraphRouterCompatibilityVersion, the handler currently returns success without persisting the change; call and await updateRouterCompatibilityVersion(federatedGraph.id, version, callerInfo) (same args used later in the file) before returning, update previousVersion and newVersion values from the persisted result (or keep federatedGraph.routerCompatibilityVersion and version respectively), and propagate any errors from updateRouterCompatibilityVersion the same way as the main flow so the reported success reflects the stored value.
113-145:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon’t commit the version change before compose/deploy has succeeded.
This transaction updates
routerCompatibilityVersionand writes the audit log before checkingcompositionErrors/deploymentErrors. Because the callback still returns normally on failure, the new version is committed even though the response below reportspreviousVersionas still active.Move the DB update/audit logging behind the compose/deploy result check, or abort the transaction when those arrays are non-empty.
Also applies to: 149-160
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@controlplane/src/core/bufservices/graph/setGraphRouterCompatibilityVersion.ts` around lines 113 - 145, The transaction currently calls FederatedGraphRepository.updateRouterCompatibilityVersion and AuditLogRepository.addAuditLog before checking the result of CompositionService.composeAndDeployFederatedGraph (the composition/deploy may return compositionErrors/deploymentErrors), so the DB commit can persist a new routerCompatibilityVersion despite a failed compose/deploy; relocate the call to updateRouterCompatibilityVersion and the audit log write to after you inspect the composeAndDeployFederatedGraph result and only perform them when there are no compositionErrors/deploymentErrors (or explicitly abort/throw from the transaction callback when those error arrays are non-empty); apply the same change to the other transaction block that updates routerCompatibilityVersion/audit logging (the similar code path noted in the comment).controlplane/src/core/bufservices/feature-flag/updateFeatureFlag.ts (1)
106-163:⚠️ Potential issue | 🟠 Major | ⚡ Quick winSame transactional inconsistency as
enableFeatureFlag.ts.The
featureFlagRepo.updateFeatureFlagcall (lines 106-111) executes outside the transaction that wrapscompositionService.composeAndDeployFeatureFlag(lines 146-163). If the transaction fails, the feature flag update persists but the composition/deployment does not.Consider wrapping both operations in the same transaction for atomicity:
Proposed fix
const prevFederatedGraphs = await featureFlagRepo.getFederatedGraphsByFeatureFlag({ featureFlagId: featureFlagDTO.id, namespaceId: namespace.id, excludeDisabled: true, }); - await featureFlagRepo.updateFeatureFlag({ - featureFlag: featureFlagDTO, - labels: req.labels, - featureSubgraphIds, - unsetLabels: req.unsetLabels ?? false, - }); - - await auditLogRepo.addAuditLog({ - // ... audit log params - }); - - const updatedFeatureFlag = await featureFlagRepo.getFeatureFlagById({ - featureFlagId: featureFlagDTO.id, - namespaceId: namespace.id, - includeSubgraphs: true, - }); - - if (!updatedFeatureFlag) { - // ... error handling - } - - const { deploymentErrors, compositionErrors, compositionWarnings } = await opts.db.transaction((tx) => { + const { deploymentErrors, compositionErrors, compositionWarnings } = await opts.db.transaction(async (tx) => { + const featureFlagRepo = new FeatureFlagRepository(logger, tx, authContext.organizationId); + + await featureFlagRepo.updateFeatureFlag({ + featureFlag: featureFlagDTO, + labels: req.labels, + featureSubgraphIds, + unsetLabels: req.unsetLabels ?? false, + }); + + const updatedFeatureFlag = await featureFlagRepo.getFeatureFlagById({ + featureFlagId: featureFlagDTO.id, + namespaceId: namespace.id, + includeSubgraphs: true, + }); + + if (!updatedFeatureFlag) { + throw new Error(`Feature flag "${featureFlagDTO.name}" was not found after updating.`); + } + const compositionService = new CompositionService( tx, // ... rest of params );🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@controlplane/src/core/bufservices/feature-flag/updateFeatureFlag.ts` around lines 106 - 163, The update is happening outside the DB transaction that runs CompositionService.composeAndDeployFeatureFlag, causing partial commits if the transaction rolls back; move the call to featureFlagRepo.updateFeatureFlag (and the subsequent auditLogRepo.addAuditLog and the fetch via featureFlagRepo.getFeatureFlagById) inside the same opts.db.transaction callback so that the update, audit log, and composition/deploy (CompositionService.composeAndDeployFeatureFlag) all run within a single transaction and either commit or roll back together.controlplane/src/core/bufservices/feature-flag/enableFeatureFlag.ts (1)
82-105:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMove
enableFeatureFlagcall inside the transaction to ensure atomicity.The
featureFlagRepo.enableFeatureFlagcall (lines 82-86) executes outside the transaction, whilecompositionService.composeAndDeployFeatureFlagruns inside (lines 88-105). If the transaction fails or rolls back, the feature flag's enabled state will persist in the database, but the composition and deployment will not be applied, leaving the system in an inconsistent state.Note that
deleteFeatureFlagwraps all state changes inside the transaction, and this handler should follow the same pattern for consistency.Proposed fix
- await featureFlagRepo.enableFeatureFlag({ - featureFlagId: featureFlag.id, - namespaceId: namespace.id, - isEnabled: req.enabled, - }); - const { deploymentErrors, compositionErrors, compositionWarnings } = await opts.db.transaction((tx) => { + const featureFlagRepo = new FeatureFlagRepository(logger, tx, authContext.organizationId); + await featureFlagRepo.enableFeatureFlag({ + featureFlagId: featureFlag.id, + namespaceId: namespace.id, + isEnabled: req.enabled, + }); + const compositionService = new CompositionService( tx,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@controlplane/src/core/bufservices/feature-flag/enableFeatureFlag.ts` around lines 82 - 105, The enableFeatureFlag call is executed outside the DB transaction causing potential state inconsistency if the subsequent composition/deploy rollbacks; move the featureFlagRepo.enableFeatureFlag invocation into the opts.db.transaction block so both the DB state change and compositionService.composeAndDeployFeatureFlag run under the same transaction (use the transaction-bound client `tx` inside the transaction callback), mirroring the pattern used by deleteFeatureFlag; keep the same payload (featureFlagId, namespaceId, isEnabled) and ensure errors cause the whole transaction to roll back.router/core/graph_server.go (1)
1340-1408:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftReused muxes still lose their plugin and pubsub backends during old-server shutdown.
Skipping
mux.Shutdown()is not enough here: the oldgraphServerstill unconditionally stopss.connectorands.pubSubProviders, but reused muxes were built against those exact resources. After a successful swap, requests on reused muxes can start failing as soon as the old server finishes shutting down.Also applies to: 2043-2073
🧹 Nitpick comments (4)
router/pkg/controlplane/configpoller/config_poller.go (1)
99-104: ⚡ Quick winForward the provider’s
Changespayload instead of resetting it.Rebuilding a fresh
routerconfig.ResponsewithChanges: nildrops any diff information the client may already have computed and forces downstream hot-reload logic to treat the update as a full rebuild.Suggested fix
- response := &routerconfig.Response{ - Config: cfg.Config, - Changes: nil, // purposefully leaving this nil to indicate we don't know what changed - } - - if err := handler(response); err != nil { + if err := handler(cfg); err != nil { c.logger.Error("Error invoking config poll handler", zap.Error(err)) return }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@router/pkg/controlplane/configpoller/config_poller.go` around lines 99 - 104, The code builds a new routerconfig.Response and clears the provider's diff by setting Changes: nil; instead, preserve and forward the original changes from the provider (use the provider payload rather than nil) so downstream hot-reload can do incremental updates—i.e., when creating the response passed to handler(response) set Changes to the incoming cfg's Changes (or copy the provider's Changes field) while keeping Config: cfg.Config.controlplane/src/core/bufservices/subgraph/deleteFederatedSubgraph.ts (1)
141-156: ⚡ Quick winPotential N+1 query pattern—consider adding an
includeSubgraphsoption togetFeatureFlagsByBaseSubgraphId.The code calls
getFeatureFlagsByBaseSubgraphId(line 141) which returns only basic fields (id, name, labels, isEnabled), then re-fetches each flag withgetFeatureFlagById(line 148) to get the full DTO includingfeatureSubgraphs. The second fetch is necessary because line 207 needs to filterfeatureSubgraphs, but this creates an N+1 query pattern.To optimize this, add an optional
includeSubgraphsparameter togetFeatureFlagsByBaseSubgraphIdso the complete feature flag data can be fetched in a single query instead of N+1 separate calls.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@controlplane/src/core/bufservices/subgraph/deleteFederatedSubgraph.ts` around lines 141 - 156, The loop causes an N+1 DB fetch: getFeatureFlagsByBaseSubgraphId currently returns minimal DTOs and then getFeatureFlagById is called per-flag to obtain featureSubgraphs; change getFeatureFlagsByBaseSubgraphId to accept an optional includeSubgraphs boolean (default false) and, when true, return the full feature-flag DTO including featureSubgraphs so deleteFederatedSubgraph.ts can call getFeatureFlagsByBaseSubgraphId({... , includeSubgraphs: true}) and push results directly into affectedFeatureFlags instead of iterating and calling getFeatureFlagById for each item.router/pkg/controlplane/configpoller/split_config_poller_test.go (1)
147-302: ⚡ Quick winAdd
Subscribecoverage for the rule branches.These cases only validate
IgnoredFeatureFlagsandSkipMissingFeatureFlagsthroughGetRouterConfig.Subscribehas separate diff/fetch logic, so ignored-flag changes and skipped missing files can regress without this suite failing. Add apollOncecase for both behaviors.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@router/pkg/controlplane/configpoller/split_config_poller_test.go` around lines 147 - 302, Add tests that exercise the Subscribe/pollOnce path for the ignored-flag and skip-missing-file branches (not just GetRouterConfig) so Subscribe's diff/fetch logic cannot regress; specifically add pollOnce-based test cases (e.g., TestSplitSubscribe_IgnoredFeatureFlag_NotFetched and TestSplitSubscribe_SkipMissingFeatureFlag_Suppressed) to split_config_poller_test.go that use the existing mockSplitFetcher, set p.configRules (IgnoredFeatureFlags and SkipMissingFeatureFlags), call p.pollOnce(ctx) (or p.Subscribe/pollOnce sequence the code uses), and assert the same outcomes as the GetRouterConfig tests: ignored feature flags are not fetched/assembled and ErrFileNotFound is suppressed only when SkipMissingFeatureFlags is true (and non-file-not-found errors still propagate). Ensure the tests reference the same unique symbols: mockSplitFetcher, p.pollOnce (or p.Subscribe), p.configRules, and assert on mock.fetchConfigCalls and returned error behavior.controlplane/src/core/services/CompositionService.ts (1)
382-398: 💤 Low valueConsider batching organization feature queries.
Two separate database calls fetch independent features. If
OrganizationRepositorysupports batch fetching, consolidating these would reduce round trips.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@controlplane/src/core/services/CompositionService.ts` around lines 382 - 398, The method `#getOrganizationFeatures` currently issues two separate calls to OrganizationRepository.getFeature for COMPOSITION_IGNORE_EXTERNAL_KEYS_FEATURE_ID and SPLIT_CONFIG_LOADING_FEATURE_ID; change it to a single batched call (e.g., a new or existing OrganizationRepository.getFeatures/getFeatureFlags that accepts an organizationId and an array of featureIds) to fetch both features in one DB round trip, then map the returned feature map to the same return shape (ignoreExternalKeys and splitConfigLoading) using the same constants COMPOSITION_IGNORE_EXTERNAL_KEYS_FEATURE_ID and SPLIT_CONFIG_LOADING_FEATURE_ID and preserving the default false behavior when a feature is missing.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cdn-server/cdn/src/index.ts`:
- Around line 381-397: The handler currently calls c.req.json() which throws on
an empty POST body and causes a 500; change the parsing so empty bodies are
treated as {} (or return 400 only for non-empty malformed JSON) before using
body.version. Specifically, replace the direct c.req.json() call with logic that
reads the raw request text or guards the JSON parse in a try/catch: if the raw
text is empty or only whitespace set body = {} else attempt JSON.parse /
c.req.json() and on parse errors return 400; keep the rest of the flow (the
version check and storage.headObject call and BlobNotFoundError handling)
unchanged so symbols like c.req.json(), body, and storage.headObject remain the
references to update.
In `@controlplane/src/core/bufservices/federated-graph/migrateFromApollo.ts`:
- Around line 148-160: composeAndDeployFederatedGraph on the CompositionService
instance can return structured failure info but the current flow always
continues to write audit logs, emit APOLLO_MIGRATE_SUCCESS and create the graph
token regardless; update the code that calls
compositionService.composeAndDeployFederatedGraph to inspect its return value,
if it indicates failure then log/record the returned failure details, emit a
failure audit/event instead of APOLLO_MIGRATE_SUCCESS, and skip creating the
graph token (and any success-only side effects); only perform the audit/emission
and token creation when the composeAndDeployFederatedGraph result indicates
success, using the failure payload to populate the error logs and audit record.
In `@controlplane/src/core/bufservices/monograph/publishMonograph.ts`:
- Around line 143-169: The code passes opts.chClient! into the
CompositionService constructor which forces a non-null assertion; remove the `!`
and pass `opts.chClient` directly so CompositionService (constructor parameter
chClient: ClickHouseClient | undefined) can handle undefined. Update the call
that constructs CompositionService (the variable named compositionService) to
use opts.chClient without the non-null assertion and ensure no other code relies
on a non-null chClient in this call path.
In `@controlplane/src/core/repositories/FederatedGraphRepository.ts`:
- Around line 202-237: The method never respects the unsetAdmissionWebhookURL
flag, so add logic in the update block that if data.unsetAdmissionWebhookURL ===
true you explicitly set admissionWebhookURL to null (taking precedence over any
admissionWebhookURL value), otherwise keep the existing behavior (normalize and
set provided URL or null when empty); update the code around the
admissionWebhookURL handling in the function that builds updates for
federatedGraphs (look for variables/methods: unsetAdmissionWebhookURL,
admissionWebhookURL, normalizeURL, federatedGraph, and the
db.update(federatedGraphs).set(...) call) to perform the nulling update when the
flag is set.
- Around line 252-318: composeAndDeployFederatedGraph is being called with the
original federatedGraph DTO (which is stale after you rewrote label matcher and
subgraph mapping tables); reload the updated graph from the repository (e.g.,
fetch by id via the same repo used elsewhere) after the DB updates and before
calling compositionService.composeAndDeployFederatedGraph so the service
receives the fresh state (including labelMatchers). In practice, after the loop
that updates schema.targetLabelMatchers and schema.subgraphsToFederatedGraph
(and before the return), call the repo method that returns the latest federated
graph (the object used earlier named federatedGraph) and pass that refreshed
object into compositionService.composeAndDeployFederatedGraph (keep using
actorId: data.updatedBy).
In `@controlplane/src/core/services/CompositionService.ts`:
- Around line 527-535: The query in CompositionService that builds
routerHashesForGraph selects all router config hashes because it lacks a WHERE
filter by federatedGraphId; update the query to only fetch hashes for the
current federated graph by adding a where clause (e.g.,
.where(eq(schema.routerConfigHash.federatedGraphId, federatedGraphId)) or the
appropriate local variable representing the current federated graph id) so
mapper.json only contains hashes for that federated graph; locate the query in
the CompositionService method that generates mapper.json and apply the filter
using schema.routerConfigHash.federatedGraphId and the method’s federatedGraphId
variable.
- Around line 954-962: The code calls await
this.#updateMapperForFederatedGraph(federatedGraph.id) both inside the contract
loop and again after the loop causing N+1 redundant uploads; remove the inner
call within the loop (the one guarded by splitConfig inside the contract
iteration) so the mapper is updated exactly once after processing all contracts,
ensuring the single call to
this.#updateMapperForFederatedGraph(federatedGraph.id) that remains after the
loop handles the upload.
In `@controlplane/test/feature-flag/feature-flag-integration-v2.test.ts`:
- Around line 25-35: The mock uses a different module specifier than the import,
so Vitest isn't intercepting ClickHouseClient; change the vi.mock call to use
the exact same module specifier as the import
("../../src/core/clickhouse/index.js") and ensure the mock exports the same
symbol shape (exporting ClickHouseClient with a prototype.queryPromise stub) so
the imported ClickHouseClient and its queryPromise are properly mocked.
In `@router-tests/protocol/config_hot_reload_test.go`:
- Around line 622-630: In the subscribe helper function replace the direct
conn.WriteJSON call with the test harness wrapper testenv.WSWriteJSON so the
websocket write uses the built-in retry/deadline logic; locate the subscribe
function that calls xEnv.InitGraphQLWebSocketConnection and currently writes a
testenv.WebSocketMessage via conn.WriteJSON and change that single write to use
testenv.WSWriteJSON(conn, &testenv.WebSocketMessage{...}) (keeping the same
message payload) so the test uses the 2s deadline and exponential backoff
provided by the helper.
In `@router/core/graph_server.go`:
- Around line 312-335: The current code marks the live mux as reused
(gm.reused.Store(true)) and updates s.graphMuxList before the replacement server
construction completes, which can leave the old mux flagged if later steps fail;
change the flow so you do NOT set gm.reused.Store(true) or overwrite
s.graphMuxList[""] until newGraphServer (and any subsequent initialization
steps) has fully succeeded. Concretely: in the branch that reuses the old mux,
defer calling gm.reused.Store(true) and defer writing s.graphMuxList[""] (and
any similar reuse bookkeeping) until after newGraphServer returns successfully;
apply the same fix to the corresponding reuse block referenced around lines
505-520 to ensure live muxes are only marked reused when the replacement is
fully built.
In `@router/pkg/controlplane/configpoller/split_config_poller.go`:
- Around line 204-272: The poll loop computes newVersion and fetches diffs from
p.knownHashes without applying the same feature-flag ignore/missing filtering
used by fetchAndAssembleAll, so ignored or missing feature flags can trigger
unnecessary reloads or abort polls; update the logic in the poll path (around
computeCompositeVersion(activeGraphs), the changes calculation, and the toFetch
loop) to first filter/mirror the mapper-filtering used by
fetchAndAssembleAll/Subscribe (i.e., remove ignored flags and tolerate missing
flags), then compute newVersion and build changes/from/toFetch from that
filtered activeGraphs, and skip fetching entries that were filtered out so
fetcher.FetchConfig and the patched assembly only run for allowed feature-flag
entries (ensure you update references to computeCompositeVersion, p.knownHashes,
activeGraphs, toFetch, and the patched application of FeatureFlagConfigs
accordingly).
- Around line 162-178: GetRouterConfig currently accepts any non-empty mapper
from fetcher.FetchMapper, but must reject mappers that lack the required base
entry key "" to match Subscribe's stricter guard; after fetching activeGraphs in
GetRouterConfig (from fetcher.FetchMapper) validate that activeGraphs contains
an entry for the empty string key and return an error like "missing base mapper
entry" if not present, so you only call p.fetchAndAssembleAll, set
p.knownHashes, p.currentConfig and p.latestVersion (computeCompositeVersion)
when the base hash is present.
In `@router/pkg/routerconfig/cdn/split_fetcher_test.go`:
- Around line 54-59: marshalActiveGraphs currently JSON-encodes a bare map but
the real CDN payload uses the mapper.json envelope with a top-level
"graphConfigs" field; modify the marshalActiveGraphs helper to wrap the provided
configs under a "graphConfigs" key (e.g., marshal a struct or map like {
"graphConfigs": configs }) so test fixtures match the actual manifest shape
consumed by SplitFetcher and ensure all FetchMapper_* tests still use this
helper.
In `@router/pkg/routerconfig/cdn/split_fetcher.go`:
- Around line 108-119: The switch over resp.StatusCode should explicitly handle
403 Forbidden and return a dedicated error instead of the generic
unexpected-status error; add a case for http.StatusForbidden that returns a new
sentinel error (e.g., errs.ErrSplitConfigAuthMissingFeature or similar) and
define that error in the errs package so callers can detect "graph token valid
but missing split-config-loading" conditions; update the switch in
split_fetcher.go to include case http.StatusForbidden: return nil,
errs.ErrSplitConfigAuthMissingFeature and add the new error constant to the
existing errs package.
In `@router/pkg/routerconfig/client.go`:
- Around line 14-17: BaseGraphChanged currently dereferences the Changes
receiver and panics when Response.Changes is nil; modify BaseGraphChanged (the
method that checks graph changes) to treat a nil receiver as "unknown =>
everything changed" and return true instead of dereferencing, i.e., add a nil
check at the top of BaseGraphChanged (and similarly for the other helper(s)
referenced around the same area) so the function returns true when the Changes
pointer is nil.
---
Outside diff comments:
In `@controlplane/src/core/bufservices/feature-flag/enableFeatureFlag.ts`:
- Around line 82-105: The enableFeatureFlag call is executed outside the DB
transaction causing potential state inconsistency if the subsequent
composition/deploy rollbacks; move the featureFlagRepo.enableFeatureFlag
invocation into the opts.db.transaction block so both the DB state change and
compositionService.composeAndDeployFeatureFlag run under the same transaction
(use the transaction-bound client `tx` inside the transaction callback),
mirroring the pattern used by deleteFeatureFlag; keep the same payload
(featureFlagId, namespaceId, isEnabled) and ensure errors cause the whole
transaction to roll back.
In `@controlplane/src/core/bufservices/feature-flag/updateFeatureFlag.ts`:
- Around line 106-163: The update is happening outside the DB transaction that
runs CompositionService.composeAndDeployFeatureFlag, causing partial commits if
the transaction rolls back; move the call to featureFlagRepo.updateFeatureFlag
(and the subsequent auditLogRepo.addAuditLog and the fetch via
featureFlagRepo.getFeatureFlagById) inside the same opts.db.transaction callback
so that the update, audit log, and composition/deploy
(CompositionService.composeAndDeployFeatureFlag) all run within a single
transaction and either commit or roll back together.
In
`@controlplane/src/core/bufservices/graph/setGraphRouterCompatibilityVersion.ts`:
- Around line 97-111: When subgraphs.length === 0 in
setGraphRouterCompatibilityVersion, the handler currently returns success
without persisting the change; call and await
updateRouterCompatibilityVersion(federatedGraph.id, version, callerInfo) (same
args used later in the file) before returning, update previousVersion and
newVersion values from the persisted result (or keep
federatedGraph.routerCompatibilityVersion and version respectively), and
propagate any errors from updateRouterCompatibilityVersion the same way as the
main flow so the reported success reflects the stored value.
- Around line 113-145: The transaction currently calls
FederatedGraphRepository.updateRouterCompatibilityVersion and
AuditLogRepository.addAuditLog before checking the result of
CompositionService.composeAndDeployFederatedGraph (the composition/deploy may
return compositionErrors/deploymentErrors), so the DB commit can persist a new
routerCompatibilityVersion despite a failed compose/deploy; relocate the call to
updateRouterCompatibilityVersion and the audit log write to after you inspect
the composeAndDeployFederatedGraph result and only perform them when there are
no compositionErrors/deploymentErrors (or explicitly abort/throw from the
transaction callback when those error arrays are non-empty); apply the same
change to the other transaction block that updates
routerCompatibilityVersion/audit logging (the similar code path noted in the
comment).
---
Nitpick comments:
In `@controlplane/src/core/bufservices/subgraph/deleteFederatedSubgraph.ts`:
- Around line 141-156: The loop causes an N+1 DB fetch:
getFeatureFlagsByBaseSubgraphId currently returns minimal DTOs and then
getFeatureFlagById is called per-flag to obtain featureSubgraphs; change
getFeatureFlagsByBaseSubgraphId to accept an optional includeSubgraphs boolean
(default false) and, when true, return the full feature-flag DTO including
featureSubgraphs so deleteFederatedSubgraph.ts can call
getFeatureFlagsByBaseSubgraphId({... , includeSubgraphs: true}) and push results
directly into affectedFeatureFlags instead of iterating and calling
getFeatureFlagById for each item.
In `@controlplane/src/core/services/CompositionService.ts`:
- Around line 382-398: The method `#getOrganizationFeatures` currently issues two
separate calls to OrganizationRepository.getFeature for
COMPOSITION_IGNORE_EXTERNAL_KEYS_FEATURE_ID and SPLIT_CONFIG_LOADING_FEATURE_ID;
change it to a single batched call (e.g., a new or existing
OrganizationRepository.getFeatures/getFeatureFlags that accepts an
organizationId and an array of featureIds) to fetch both features in one DB
round trip, then map the returned feature map to the same return shape
(ignoreExternalKeys and splitConfigLoading) using the same constants
COMPOSITION_IGNORE_EXTERNAL_KEYS_FEATURE_ID and SPLIT_CONFIG_LOADING_FEATURE_ID
and preserving the default false behavior when a feature is missing.
In `@router/pkg/controlplane/configpoller/config_poller.go`:
- Around line 99-104: The code builds a new routerconfig.Response and clears the
provider's diff by setting Changes: nil; instead, preserve and forward the
original changes from the provider (use the provider payload rather than nil) so
downstream hot-reload can do incremental updates—i.e., when creating the
response passed to handler(response) set Changes to the incoming cfg's Changes
(or copy the provider's Changes field) while keeping Config: cfg.Config.
In `@router/pkg/controlplane/configpoller/split_config_poller_test.go`:
- Around line 147-302: Add tests that exercise the Subscribe/pollOnce path for
the ignored-flag and skip-missing-file branches (not just GetRouterConfig) so
Subscribe's diff/fetch logic cannot regress; specifically add pollOnce-based
test cases (e.g., TestSplitSubscribe_IgnoredFeatureFlag_NotFetched and
TestSplitSubscribe_SkipMissingFeatureFlag_Suppressed) to
split_config_poller_test.go that use the existing mockSplitFetcher, set
p.configRules (IgnoredFeatureFlags and SkipMissingFeatureFlags), call
p.pollOnce(ctx) (or p.Subscribe/pollOnce sequence the code uses), and assert the
same outcomes as the GetRouterConfig tests: ignored feature flags are not
fetched/assembled and ErrFileNotFound is suppressed only when
SkipMissingFeatureFlags is true (and non-file-not-found errors still propagate).
Ensure the tests reference the same unique symbols: mockSplitFetcher, p.pollOnce
(or p.Subscribe), p.configRules, and assert on mock.fetchConfigCalls and
returned error behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: bb1f1f47-f718-4c54-a219-2ec00feb6892
📒 Files selected for processing (58)
cdn-server/cdn/src/index.tscdn-server/cdn/test/cdn.test.tscontrolplane/migrations/0137_long_warbound.sqlcontrolplane/migrations/meta/0137_snapshot.jsoncontrolplane/migrations/meta/_journal.jsoncontrolplane/src/core/bufservices/contract/createContract.tscontrolplane/src/core/bufservices/contract/updateContract.tscontrolplane/src/core/bufservices/feature-flag/createFeatureFlag.tscontrolplane/src/core/bufservices/feature-flag/deleteFeatureFlag.tscontrolplane/src/core/bufservices/feature-flag/enableFeatureFlag.tscontrolplane/src/core/bufservices/feature-flag/updateFeatureFlag.tscontrolplane/src/core/bufservices/federated-graph/createFederatedGraph.tscontrolplane/src/core/bufservices/federated-graph/createFederatedGraphToken.tscontrolplane/src/core/bufservices/federated-graph/migrateFromApollo.tscontrolplane/src/core/bufservices/federated-graph/moveFederatedGraph.tscontrolplane/src/core/bufservices/federated-graph/updateFederatedGraph.tscontrolplane/src/core/bufservices/graph/recomposeGraph.tscontrolplane/src/core/bufservices/graph/setGraphRouterCompatibilityVersion.tscontrolplane/src/core/bufservices/monograph/publishMonograph.tscontrolplane/src/core/bufservices/monograph/updateMonograph.tscontrolplane/src/core/bufservices/subgraph/deleteFederatedSubgraph.tscontrolplane/src/core/bufservices/subgraph/moveSubgraph.tscontrolplane/src/core/bufservices/subgraph/publishFederatedSubgraph.tscontrolplane/src/core/bufservices/subgraph/updateSubgraph.tscontrolplane/src/core/composition/composer.tscontrolplane/src/core/repositories/FeatureFlagRepository.tscontrolplane/src/core/repositories/FederatedGraphRepository.tscontrolplane/src/core/repositories/OrganizationRepository.tscontrolplane/src/core/repositories/SubgraphRepository.tscontrolplane/src/core/services/CompositionService.tscontrolplane/src/db/schema.tscontrolplane/src/types/index.tscontrolplane/test/feature-flag/feature-flag-integration-v2.test.tscontrolplane/test/feature-flag/feature-flag-integration.test.tsdocs-website/router/configuration.mdxrouter-tests/events/nats_events_test.gorouter-tests/operations/cache_warmup_test.gorouter-tests/operations/plan_fallback_cache_test.gorouter-tests/protocol/config_hot_reload_test.gorouter/core/graph_server.gorouter/core/init_config_poller.gorouter/core/router.gorouter/core/supervisor_instance.gorouter/internal/jwt/claims.gorouter/pkg/config/config.gorouter/pkg/config/config.schema.jsonrouter/pkg/config/fixtures/full.yamlrouter/pkg/config/testdata/config_defaults.jsonrouter/pkg/config/testdata/config_full.jsonrouter/pkg/controlplane/configpoller/config_poller.gorouter/pkg/controlplane/configpoller/split_config_poller.gorouter/pkg/controlplane/configpoller/split_config_poller_test.gorouter/pkg/errs/errors.gorouter/pkg/routerconfig/cdn/client.gorouter/pkg/routerconfig/cdn/split_fetcher.gorouter/pkg/routerconfig/cdn/split_fetcher_test.gorouter/pkg/routerconfig/client.gorouter/pkg/routerconfig/s3/client.go
…onfigs Co-authored-by: Copilot <copilot@github.com>
Router-nonroot image scan passed✅ No security vulnerabilities found in image: |
Codecov Report❌ Patch coverage is ❌ Your patch check has failed because the patch coverage (84.06%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #2847 +/- ##
==========================================
- Coverage 65.74% 65.01% -0.74%
==========================================
Files 254 573 +319
Lines 26546 71938 +45392
Branches 0 4862 +4862
==========================================
+ Hits 17452 46767 +29315
- Misses 7680 23724 +16044
- Partials 1414 1447 +33
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
router/core/router.go (1)
608-618:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGuard against nil
routerconfig.Responsebefore dereferencing
response.Configis dereferenced on Line 618 and Line 1619 without validation. A malformed/empty poller emission would panic in the hot-reload path instead of returning an error.Suggested fix
func (r *Router) newServer(ctx context.Context, response *routerconfig.Response) error { + if response == nil || response.Config == nil { + return errors.New("router config response/config cannot be nil") + } + server, err := newGraphServer(ctx, r, response, r.proxy) if err != nil { r.logger.Error("Failed to create graph server. Keeping the old server", zap.Error(err)) @@ r.configPoller.Subscribe(ctx, func(response *routerconfig.Response) error { if r.shutdown.Load() { r.logger.Warn("Router is in shutdown state. Skipping config update") return nil } + if response == nil || response.Config == nil { + return errors.New("received empty router config response from config poller") + } r.trackExecutionConfigUsage(response.Config, false)Also applies to: 1613-1621
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@router/core/router.go` around lines 608 - 618, The code dereferences response.Config in Router.newServer (and the other hot-reload spot that also uses response.Config) without checking response for nil, which can panic on malformed poller emissions; update Router.newServer (and the other handler that calls reloadPersistentState.CleanupFeatureFlags) to first validate that response != nil and response.Config != nil, return a descriptive error if nil, and only call newGraphServer and reloadPersistentState.CleanupFeatureFlags when the checks pass so the hot-reload path fails gracefully instead of panicking.
🧹 Nitpick comments (1)
controlplane/migrations/0138_lazy_speedball.sql (1)
6-6: ⚡ Quick winMake
created_atnon-null for stronger audit integrity.Line 6 currently allows explicit
NULL, which weakens timestamp guarantees for this table.Suggested migration tweak
- "created_at" timestamp DEFAULT now(), + "created_at" timestamp DEFAULT now() NOT NULL,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@controlplane/migrations/0138_lazy_speedball.sql` at line 6, Update the migration so the "created_at" column is defined as NOT NULL with a default timestamp (change to created_at timestamp NOT NULL DEFAULT now()) and add a backfill + constraint step to handle existing NULLs: run an UPDATE to set created_at = now() for rows where created_at IS NULL, then ALTER TABLE to set NOT NULL; ensure these changes are applied within the same migration (file 0138_lazy_speedball.sql) so the created_at column enforces non-null going forward.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@router/core/router.go`:
- Around line 608-618: The code dereferences response.Config in Router.newServer
(and the other hot-reload spot that also uses response.Config) without checking
response for nil, which can panic on malformed poller emissions; update
Router.newServer (and the other handler that calls
reloadPersistentState.CleanupFeatureFlags) to first validate that response !=
nil and response.Config != nil, return a descriptive error if nil, and only call
newGraphServer and reloadPersistentState.CleanupFeatureFlags when the checks
pass so the hot-reload path fails gracefully instead of panicking.
---
Nitpick comments:
In `@controlplane/migrations/0138_lazy_speedball.sql`:
- Line 6: Update the migration so the "created_at" column is defined as NOT NULL
with a default timestamp (change to created_at timestamp NOT NULL DEFAULT now())
and add a backfill + constraint step to handle existing NULLs: run an UPDATE to
set created_at = now() for rows where created_at IS NULL, then ALTER TABLE to
set NOT NULL; ensure these changes are applied within the same migration (file
0138_lazy_speedball.sql) so the created_at column enforces non-null going
forward.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: cc71aa41-a157-4ded-907f-ebc2da3c56cd
📒 Files selected for processing (13)
controlplane/migrations/0138_lazy_speedball.sqlcontrolplane/migrations/meta/0138_snapshot.jsoncontrolplane/migrations/meta/_journal.jsoncontrolplane/src/db/schema.tsdocs-website/router/configuration.mdxrouter-tests/events/nats_events_test.gorouter/core/graph_server.gorouter/core/router.gorouter/pkg/config/config.gorouter/pkg/config/config.schema.jsonrouter/pkg/config/fixtures/full.yamlrouter/pkg/config/testdata/config_defaults.jsonrouter/pkg/config/testdata/config_full.json
✅ Files skipped from review due to trivial changes (3)
- router/pkg/config/testdata/config_full.json
- controlplane/migrations/meta/_journal.json
- router/pkg/config/fixtures/full.yaml
🚧 Files skipped from review as they are similar to previous changes (7)
- controlplane/src/db/schema.ts
- router/pkg/config/config.go
- router/pkg/config/testdata/config_defaults.json
- router-tests/events/nats_events_test.go
- docs-website/router/configuration.mdx
- router/pkg/config/config.schema.json
- router/core/graph_server.go
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@controlplane/src/core/repositories/FederatedGraphRepository.ts`:
- Around line 380-397: The non-contract branch still passes the original stale
data.federatedGraph to compositionService.composeAndDeployFederatedGraph;
re-fetch the updated federated graph from
fedGraphRepo.byId(data.federatedGraph.id) (same as the contract branch) and pass
that fresh object to compositionService.composeAndDeployFederatedGraph with
actorId: data.updatedBy to ensure namespaceId/namespace changes are used; keep
the existing contract branch behavior using the fetched movedContractGraph and
mirror it for the non-contract path.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 9f5f08d9-867d-4702-96a1-4da6c4670191
📒 Files selected for processing (2)
controlplane/src/core/bufservices/monograph/publishMonograph.tscontrolplane/src/core/repositories/FederatedGraphRepository.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- controlplane/src/core/bufservices/monograph/publishMonograph.ts
…ith transaction support Co-authored-by: Copilot <copilot@github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
controlplane/src/core/bufservices/graph/setGraphRouterCompatibilityVersion.ts (1)
115-146:⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy liftDon't persist the compatibility bump before compose/deploy has definitively succeeded.
Line 117 updates
routerCompatibilityVersionand Lines 120-133 write the audit log, both inside the transaction, beforecomposeAndDeployFederatedGraph(...)runs. The method returns populatedcompositionErrors/deploymentErrorsarrays without throwing, so the transaction commits these writes even when the response itself reports the old version (Lines 159-162). This creates a disconnect: the DB contains the new version but the operation is presented to the client as failed—clients relying on DB state will see a version mismatch with the reported response.Move compose/deploy outside the DB transaction and persist/audit only after success, or throw on non-empty error arrays to roll back. Additionally, this transaction holds a connection open across composition/deployment work that involves external systems (blob storage, webhook calls), increasing lock contention and timeout risk.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@controlplane/src/core/bufservices/graph/setGraphRouterCompatibilityVersion.ts` around lines 115 - 146, The code currently calls fedGraphRepo.updateRouterCompatibilityVersion and auditLogRepo.addAuditLog inside the DB transaction before calling compositionService.composeAndDeployFederatedGraph, which can commit a version bump even when composition/deployment fails; move the call to compositionService.composeAndDeployFederatedGraph out of the transaction so external composition/deployment runs first, then begin a short transaction to call FederatedGraphRepository.updateRouterCompatibilityVersion and AuditLogRepository.addAuditLog only after composition/deploy returns success (or alternatively throw if compositionService returns non-empty compositionErrors/deploymentErrors to force rollback); ensure the transaction no longer spans external I/O to avoid long-held DB locks.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@controlplane/src/core/bufservices/feature-flag/enableFeatureFlag.ts`:
- Around line 82-106: The DB transaction currently spans non-DB work: move all
external side-effect work out of the transaction by limiting the tx block to
only persistence (call FeatureFlagRepository.enableFeatureFlag and persist any
outbox/enqueue record) and return the minimal data needed (e.g., featureFlag,
namespace id, any outbox id). Do not instantiate or call
CompositionService.composeAndDeployFeatureFlag inside opts.db.transaction;
instead, after the transaction resolves/commits, create the CompositionService
and call composeAndDeployFeatureFlag with the returned data (actorId,
featureFlag, isEnabled). Ensure any required side-effect guarantees use an
outbox/enqueue record written inside the transaction so composition can safely
run after commit.
---
Outside diff comments:
In
`@controlplane/src/core/bufservices/graph/setGraphRouterCompatibilityVersion.ts`:
- Around line 115-146: The code currently calls
fedGraphRepo.updateRouterCompatibilityVersion and auditLogRepo.addAuditLog
inside the DB transaction before calling
compositionService.composeAndDeployFederatedGraph, which can commit a version
bump even when composition/deployment fails; move the call to
compositionService.composeAndDeployFederatedGraph out of the transaction so
external composition/deployment runs first, then begin a short transaction to
call FederatedGraphRepository.updateRouterCompatibilityVersion and
AuditLogRepository.addAuditLog only after composition/deploy returns success (or
alternatively throw if compositionService returns non-empty
compositionErrors/deploymentErrors to force rollback); ensure the transaction no
longer spans external I/O to avoid long-held DB locks.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c68af8b1-bb61-4724-8060-60ef380caf74
📒 Files selected for processing (3)
controlplane/src/core/bufservices/feature-flag/enableFeatureFlag.tscontrolplane/src/core/bufservices/feature-flag/updateFeatureFlag.tscontrolplane/src/core/bufservices/graph/setGraphRouterCompatibilityVersion.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- controlplane/src/core/bufservices/feature-flag/updateFeatureFlag.ts
Summary by CodeRabbit
New Features
Documentation
Tests
Checklist
Open Source AI Manifesto
This project follows the principles of the Open Source AI Manifesto. Please ensure your contribution aligns with its principles.