Skip to content

Conversation

@sirtimid
Copy link
Contributor

@sirtimid sirtimid commented Nov 13, 2025

Closes #685

Add closeConnection and reconnectPeer methods to allow explicit control over remote peer connections. The system now detects intentional disconnects (SCTP_USER_INITIATED_ABORT) and prevents automatic reconnection for intentionally closed peers. Users can manually reconnect using reconnectPeer when needed. This change propagates through the entire kernel architecture including network layer, RemoteManager, Kernel, PlatformServices, and RPC handlers, with comprehensive test coverage.


Note

Introduce closeConnection and reconnectPeer APIs end-to-end and treat intentional disconnects as non-retryable, with manual reconnection support.

  • Remote Comms / Network:
    • Add closeConnection(peerId) and reconnectPeer(peerId, hints) with per-peer "intentionally closed" tracking.
    • Detect SCTP_USER_INITIATED_ABORT as intentional disconnect; skip auto-reconnect, reject inbound from closed peers, preserve/merge hints during flush.
  • RPC & Platform Services:
    • New RPC specs/handlers: closeConnection, reconnectPeer in platform-services.
    • Wire through browser server/client and NodeJS PlatformServices to network layer; clear funcs on stopRemoteComms.
  • Kernel & Remotes:
    • Expose Kernel.closeConnection() and Kernel.reconnectPeer() via RemoteManager and RemoteComms types.
  • Tests / E2E:
    • Comprehensive unit and E2E coverage for new APIs, reconnection behavior, queueing, and intentional disconnect flows.
    • Add test helpers (remote-comms.ts, refactors) and adjust coverage thresholds.

Written by Cursor Bugbot for commit 252f5d7. This will update automatically on new commits. Configure here.

@sirtimid sirtimid requested a review from a team as a code owner November 13, 2025 17:27
logger.log(`${channel.peerId}:: remote disconnected`);
logger.log(`${channel.peerId}:: remote intentionally disconnected`);
// Mark as intentionally closed and don't trigger reconnection
intentionallyClosed.add(channel.peerId);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Remote Disconnects Permanently Block Connections

When a remote peer initiates a graceful disconnect (SCTP_USER_INITIATED_ABORT), the local kernel incorrectly adds that peer to intentionallyClosed, preventing future outbound connections. The intentionallyClosed set should only track peers that the local kernel explicitly closed via closeConnection(), not peers that remotely disconnected. This causes the local kernel to permanently refuse sending messages to peers that gracefully restarted, even though the local kernel never called closeConnection().

Fix in Cursor Fix in Web

Copy link
Contributor

@FUDCo FUDCo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo my question below about what the "re" in "reconnect" actually means. If you are comfortable with whatever your answer to my question is, go ahead and land the PR, but at least consider whether what you've got here is actually what we want.

Also, this PR really drove home the amount of boilerplate involved in adding to the remote comms protocol. I'm not sure there's much if anything to be done about this, as doing something about it would probably be an awful lot of work, and I expect the protocol itself to be quite stable in general and therefor going through the effort of tweaking the boilerplate will (hopefully) be relatively rare, at least once we converge on the One True API. But it did give me pause.

Comment on lines +120 to +127
/**
* Launch a new worker with a specific vat id.
*
* @param vatId - The vat id of the worker to launch.
* @param vatConfig - The configuration for the worker.
* @returns A promise for a duplex stream connected to the worker
* which rejects if a worker with the given vat id already exists.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding these doc comments. It's interesting that our lint rules demand jsdoc comments on functions but not on methods. I wonder if we can change that (or did it actually change and that's what motivated these?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We follow the MetaMask eslint rule which doesn't demand jsdoc comments on methods. We certainly can enable it though

*/
async reconnectPeer(peerId: string, hints: string[] = []): Promise<void> {
await this.#rpcClient.call('reconnectPeer', { peerId, hints });
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it considered reconnection (as one would do after a network outage) or is it just a new connection to the same endpoint as before? In particular, do we expect clist entries to survive a trip through manual disconnect/manual reconnect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is a reconnection not a new connection. It reuses the existing RemoteHandle and remote endpoint. Clist entries and other endpoint state persist across the disconnect/reconnect cycle.

@sirtimid sirtimid merged commit 3417236 into main Nov 20, 2025
26 checks passed
@sirtimid sirtimid deleted the sirtimid/handling-remote-comms-disconnect branch November 20, 2025 00:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remote comms: Intentional disconnect handling

3 participants