Skip to content

[CRE] Fix capability serving in single-DON topologies#21310

Open
nadahalli wants to merge 3 commits intodevelopfrom
tejaswi/fix-single-don-capability-serving
Open

[CRE] Fix capability serving in single-DON topologies#21310
nadahalli wants to merge 3 commits intodevelopfrom
tejaswi/fix-single-don-capability-serving

Conversation

@nadahalli
Copy link
Contributor

@nadahalli nadahalli commented Feb 25, 2026

Summary

In single-DON topologies (e.g. local CRE), a single DON acts as both the workflow DON and the capability DON. The launcher classifies this DON into myWorkflowDONs (since the node is a member), not remoteWorkflowDONs. When the launcher then calls serveCapabilities, it only passes remoteWorkflowDONs, which is empty. This causes executable/server.go to reject every capability with:

failed to serve capability: <capability-id>
err: empty workflowDONs provided

The fix combines remoteWorkflowDONs and myWorkflowDONs before passing them to serveCapabilities. This is safe because serveCapabilities builds an idsToDONs map from the workflow DONs and uses it to route incoming requests to the correct DON. Including the node's own workflow DON in that map is correct; the DON is a legitimate workflow DON that should be able to invoke capabilities on itself.

How to reproduce

  1. Start a local CRE environment (go run . env setup) with a single DON that has both workflow and capability roles
  2. Register any LOOP-based capability (e.g. confidential-http)
  3. Observe the capability fails to start with empty workflowDONs provided in the node logs

Why this wasn't caught earlier

In multi-DON production deployments, workflow DONs and capability DONs are separate. The workflow DON appears in remoteWorkflowDONs from the capability DON's perspective, so the list is never empty. The single-DON topology only occurs in local CRE and similar dev/test environments.

The workflowDONs validation was added in commit 5950b6ab79 (CRE-941, dynamic config updates). Before that commit, an empty workflowDONs was silently accepted.

What this changes

core/capabilities/launcher.go: When serving capabilities, pass all known workflow DONs (both remote and the node's own) instead of only remote ones.

Requires

Supports

Include myWorkflowDONs when passing workflow DONs to serveCapabilities.
In single-DON topologies (e.g. local CRE), the same DON acts as both the
workflow DON and the capability DON. The launcher classified it into
myWorkflowDONs (not remoteWorkflowDONs), so remoteWorkflowDONs was empty.
Passing only remoteWorkflowDONs to serveCapabilities caused
executable/server.go to reject the capability with "empty workflowDONs
provided".
Copilot AI review requested due to automatic review settings February 25, 2026 19:51
@nadahalli nadahalli requested review from a team as code owners February 25, 2026 19:51
@github-actions
Copy link
Contributor

👋 nadahalli, thanks for creating this pull request!

To help reviewers, please consider creating future PRs as drafts first. This allows you to self-review and make any final changes before notifying the team.

Once you're ready, you can mark it as "Ready for review" to request feedback. Thanks!

@github-actions
Copy link
Contributor

github-actions bot commented Feb 25, 2026

✅ No conflicts with other open PRs targeting develop

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes capability serving in single-DON topologies by ensuring the capability server receives a non-empty workflow DON allowlist, preventing local CRE/dev setups from failing with empty workflowDONs provided.

Changes:

  • Combine remoteWorkflowDONs and myWorkflowDONs into a single slice before calling serveCapabilities.
  • Add inline documentation explaining why single-DON topologies require including myWorkflowDONs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +404 to +413
// Include both remote workflow DONs and the node's own workflow DONs.
// In single-DON topologies (e.g. local CRE), the same DON is both a
// workflow DON and a capability DON, so remoteWorkflowDONs is empty.
// Without including myWorkflowDONs, capabilities fail to serve with
// "empty workflowDONs provided".
allWorkflowDONs := make([]registrysyncer.DON, 0, len(remoteWorkflowDONs)+len(myWorkflowDONs))
allWorkflowDONs = append(allWorkflowDONs, remoteWorkflowDONs...)
allWorkflowDONs = append(allWorkflowDONs, myWorkflowDONs...)
for _, myDON := range myCapabilityDONs {
w.serveCapabilities(ctx, w.myPeerID, myDON, localRegistry, remoteWorkflowDONs)
w.serveCapabilities(ctx, w.myPeerID, myDON, localRegistry, allWorkflowDONs)
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a regression test covering the single-DON topology described in the PR (node belongs to both workflow and capability roles in the same DON). Today this change is untested, and without a test it’s easy to reintroduce passing an empty workflow DON allowlist into the executable/trigger servers (which hard-fail with "empty workflowDONs provided"). A launcher_test.go case should assert that OnNewRegistry successfully serves a capability when remoteWorkflowDONs is empty but myWorkflowDONs is non-empty (e.g., by verifying dispatcher.SetReceiver is called and OnNewRegistry returns nil).

Copilot uses AI. Check for mistakes.
Covers the topology where a single DON is both a workflow DON and a
capability DON (e.g. local CRE). Verifies that capabilities are served
correctly when remoteWorkflowDONs is empty but myWorkflowDONs is not.
@nadahalli
Copy link
Contributor Author

Added a regression test in launcher_test.go. It sets up a single DON that is both workflow and capability (the single-DON topology), verifies OnNewRegistry succeeds and SetReceiver is called for both trigger and target capabilities, and asserts no "failed to serve capability" errors are logged.

@cl-sonarqube-production
Copy link

@trunk-io
Copy link

trunk-io bot commented Feb 25, 2026

Static BadgeStatic BadgeStatic BadgeStatic Badge

View Full Report ↗︎Docs

@nadahalli nadahalli changed the title Fix capability serving in single-DON topologies [CRE] Fix capability serving in single-DON topologies Mar 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants