feat(remote-comms): Add explicit connection management for intentional disconnects #699

sirtimid · 2025-11-13T17:27:23Z

Closes #685

Add closeConnection and reconnectPeer methods to allow explicit control over remote peer connections. The system now detects intentional disconnects (SCTP_USER_INITIATED_ABORT) and prevents automatic reconnection for intentionally closed peers. Users can manually reconnect using reconnectPeer when needed. This change propagates through the entire kernel architecture including network layer, RemoteManager, Kernel, PlatformServices, and RPC handlers, with comprehensive test coverage.

Note

Introduce closeConnection and reconnectPeer APIs end-to-end and treat intentional disconnects as non-retryable, with manual reconnection support.

Remote Comms / Network:
- Add closeConnection(peerId) and reconnectPeer(peerId, hints) with per-peer "intentionally closed" tracking.
- Detect SCTP_USER_INITIATED_ABORT as intentional disconnect; skip auto-reconnect, reject inbound from closed peers, preserve/merge hints during flush.
RPC & Platform Services:
- New RPC specs/handlers: closeConnection, reconnectPeer in platform-services.
- Wire through browser server/client and NodeJS PlatformServices to network layer; clear funcs on stopRemoteComms.
Kernel & Remotes:
- Expose Kernel.closeConnection() and Kernel.reconnectPeer() via RemoteManager and RemoteComms types.
Tests / E2E:
- Comprehensive unit and E2E coverage for new APIs, reconnection behavior, queueing, and intentional disconnect flows.
- Add test helpers (remote-comms.ts, refactors) and adjust coverage thresholds.

^{Written by Cursor Bugbot for commit 252f5d7. This will update automatically on new commits. Configure here.}

cursor · 2025-11-13T17:32:55Z

packages/ocap-kernel/src/remotes/network.ts

-            logger.log(`${channel.peerId}:: remote disconnected`);
+            logger.log(`${channel.peerId}:: remote intentionally disconnected`);
+            // Mark as intentionally closed and don't trigger reconnection
+            intentionallyClosed.add(channel.peerId);


Bug: Remote Disconnects Permanently Block Connections

When a remote peer initiates a graceful disconnect (SCTP_USER_INITIATED_ABORT), the local kernel incorrectly adds that peer to intentionallyClosed, preventing future outbound connections. The intentionallyClosed set should only track peers that the local kernel explicitly closed via closeConnection(), not peers that remotely disconnected. This causes the local kernel to permanently refuse sending messages to peers that gracefully restarted, even though the local kernel never called closeConnection().

FUDCo

LGTM modulo my question below about what the "re" in "reconnect" actually means. If you are comfortable with whatever your answer to my question is, go ahead and land the PR, but at least consider whether what you've got here is actually what we want.

Also, this PR really drove home the amount of boilerplate involved in adding to the remote comms protocol. I'm not sure there's much if anything to be done about this, as doing something about it would probably be an awful lot of work, and I expect the protocol itself to be quite stable in general and therefor going through the effort of tweaking the boilerplate will (hopefully) be relatively rare, at least once we converge on the One True API. But it did give me pause.

FUDCo · 2025-11-14T21:37:34Z

packages/kernel-browser-runtime/src/PlatformServicesClient.ts

+  /**
+   * Launch a new worker with a specific vat id.
+   *
+   * @param vatId - The vat id of the worker to launch.
+   * @param vatConfig - The configuration for the worker.
+   * @returns A promise for a duplex stream connected to the worker
+   * which rejects if a worker with the given vat id already exists.
+   */


Thank you for adding these doc comments. It's interesting that our lint rules demand jsdoc comments on functions but not on methods. I wonder if we can change that (or did it actually change and that's what motivated these?)

We follow the MetaMask eslint rule which doesn't demand jsdoc comments on methods. We certainly can enable it though

FUDCo · 2025-11-14T21:39:41Z

packages/kernel-browser-runtime/src/PlatformServicesClient.ts

+   */
+  async reconnectPeer(peerId: string, hints: string[] = []): Promise<void> {
+    await this.#rpcClient.call('reconnectPeer', { peerId, hints });
+  }


Is it considered reconnection (as one would do after a network outage) or is it just a new connection to the same endpoint as before? In particular, do we expect clist entries to survive a trip through manual disconnect/manual reconnect?

Yeah, this is a reconnection not a new connection. It reuses the existing RemoteHandle and remote endpoint. Clist entries and other endpoint state persist across the disconnect/reconnect cycle.

Add explicit connection management for intentional disconnects

8099126

sirtimid requested a review from a team as a code owner November 13, 2025 17:27

fix test

252f5d7

cursor bot reviewed Nov 13, 2025

View reviewed changes

FUDCo approved these changes Nov 14, 2025

View reviewed changes

sirtimid merged commit 3417236 into main Nov 20, 2025
26 checks passed

sirtimid deleted the sirtimid/handling-remote-comms-disconnect branch November 20, 2025 00:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(remote-comms): Add explicit connection management for intentional disconnects #699

feat(remote-comms): Add explicit connection management for intentional disconnects #699

Uh oh!

sirtimid commented Nov 13, 2025 •

edited by cursor bot

Loading

Uh oh!

cursor bot Nov 13, 2025

Uh oh!

FUDCo left a comment

Uh oh!

FUDCo Nov 14, 2025

Uh oh!

sirtimid Nov 20, 2025

Uh oh!

FUDCo Nov 14, 2025

Uh oh!

sirtimid Nov 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat(remote-comms): Add explicit connection management for intentional disconnects #699

feat(remote-comms): Add explicit connection management for intentional disconnects #699

Uh oh!

Conversation

sirtimid commented Nov 13, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor bot Nov 13, 2025

Choose a reason for hiding this comment

Bug: Remote Disconnects Permanently Block Connections

Uh oh!

FUDCo left a comment

Choose a reason for hiding this comment

Uh oh!

FUDCo Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

sirtimid Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

FUDCo Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

sirtimid Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sirtimid commented Nov 13, 2025 •

edited by cursor bot

Loading