[CRE] Fix capability serving in single-DON topologies#21310
[CRE] Fix capability serving in single-DON topologies#21310
Conversation
Include myWorkflowDONs when passing workflow DONs to serveCapabilities. In single-DON topologies (e.g. local CRE), the same DON acts as both the workflow DON and the capability DON. The launcher classified it into myWorkflowDONs (not remoteWorkflowDONs), so remoteWorkflowDONs was empty. Passing only remoteWorkflowDONs to serveCapabilities caused executable/server.go to reject the capability with "empty workflowDONs provided".
|
👋 nadahalli, thanks for creating this pull request! To help reviewers, please consider creating future PRs as drafts first. This allows you to self-review and make any final changes before notifying the team. Once you're ready, you can mark it as "Ready for review" to request feedback. Thanks! |
|
✅ No conflicts with other open PRs targeting |
There was a problem hiding this comment.
Pull request overview
Fixes capability serving in single-DON topologies by ensuring the capability server receives a non-empty workflow DON allowlist, preventing local CRE/dev setups from failing with empty workflowDONs provided.
Changes:
- Combine
remoteWorkflowDONsandmyWorkflowDONsinto a single slice before callingserveCapabilities. - Add inline documentation explaining why single-DON topologies require including
myWorkflowDONs.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Include both remote workflow DONs and the node's own workflow DONs. | ||
| // In single-DON topologies (e.g. local CRE), the same DON is both a | ||
| // workflow DON and a capability DON, so remoteWorkflowDONs is empty. | ||
| // Without including myWorkflowDONs, capabilities fail to serve with | ||
| // "empty workflowDONs provided". | ||
| allWorkflowDONs := make([]registrysyncer.DON, 0, len(remoteWorkflowDONs)+len(myWorkflowDONs)) | ||
| allWorkflowDONs = append(allWorkflowDONs, remoteWorkflowDONs...) | ||
| allWorkflowDONs = append(allWorkflowDONs, myWorkflowDONs...) | ||
| for _, myDON := range myCapabilityDONs { | ||
| w.serveCapabilities(ctx, w.myPeerID, myDON, localRegistry, remoteWorkflowDONs) | ||
| w.serveCapabilities(ctx, w.myPeerID, myDON, localRegistry, allWorkflowDONs) |
There was a problem hiding this comment.
Add a regression test covering the single-DON topology described in the PR (node belongs to both workflow and capability roles in the same DON). Today this change is untested, and without a test it’s easy to reintroduce passing an empty workflow DON allowlist into the executable/trigger servers (which hard-fail with "empty workflowDONs provided"). A launcher_test.go case should assert that OnNewRegistry successfully serves a capability when remoteWorkflowDONs is empty but myWorkflowDONs is non-empty (e.g., by verifying dispatcher.SetReceiver is called and OnNewRegistry returns nil).
Covers the topology where a single DON is both a workflow DON and a capability DON (e.g. local CRE). Verifies that capabilities are served correctly when remoteWorkflowDONs is empty but myWorkflowDONs is not.
|
Added a regression test in launcher_test.go. It sets up a single DON that is both workflow and capability (the single-DON topology), verifies OnNewRegistry succeeds and SetReceiver is called for both trigger and target capabilities, and asserts no "failed to serve capability" errors are logged. |
|




Summary
In single-DON topologies (e.g. local CRE), a single DON acts as both the workflow DON and the capability DON. The launcher classifies this DON into
myWorkflowDONs(since the node is a member), notremoteWorkflowDONs. When the launcher then callsserveCapabilities, it only passesremoteWorkflowDONs, which is empty. This causesexecutable/server.goto reject every capability with:The fix combines
remoteWorkflowDONsandmyWorkflowDONsbefore passing them toserveCapabilities. This is safe becauseserveCapabilitiesbuilds anidsToDONsmap from the workflow DONs and uses it to route incoming requests to the correct DON. Including the node's own workflow DON in that map is correct; the DON is a legitimate workflow DON that should be able to invoke capabilities on itself.How to reproduce
go run . env setup) with a single DON that has both workflow and capability rolesconfidential-http)empty workflowDONs providedin the node logsWhy this wasn't caught earlier
In multi-DON production deployments, workflow DONs and capability DONs are separate. The workflow DON appears in
remoteWorkflowDONsfrom the capability DON's perspective, so the list is never empty. The single-DON topology only occurs in local CRE and similar dev/test environments.The
workflowDONsvalidation was added in commit5950b6ab79(CRE-941, dynamic config updates). Before that commit, an emptyworkflowDONswas silently accepted.What this changes
core/capabilities/launcher.go: When serving capabilities, pass all known workflow DONs (both remote and the node's own) instead of only remote ones.Requires
Supports