Skip to content

Conversation

@ramfox
Copy link
Member

@ramfox ramfox commented Oct 17, 2025

Description

Attempts to make Connection::remote_id and Connection::alpn infallible, by:

  1. intercepting the Connection in Incoming and Connecting to ensure a remote_id and alpn are present before returning an iroh::endpoint::Connection
  2. creates a ZRTTConnection that has the same methods as a Connection, but includes having a fallible remote_id and alpn

The code is not ideal, but it illustrates some of the limitations here.

This creates a bit of a mess when using 0-rrt connections, as you can see in the 0-rtt example and the 0-rtt tests, since a ZRTTConnection and Connection are no longer the same, it becomes very annoying to deal with the result of a Connecting::into_0rtt()

Breaking Changes

Notes & open questions

This whole PR is an open question

Change checklist

  • Self-review.
  • Documentation updates following the style guide, if relevant.
  • Tests if relevant.
  • All breaking changes documented.
    • List all breaking changes in the above "Breaking Changes" section.
    • Open an issue or PR on any number0 repos that are affected by this breaking change. Give guidance on how the updates should be handled or do the actual updates themselves. The major ones are:

@github-actions
Copy link

github-actions bot commented Oct 17, 2025

Documentation for this PR has been generated and is available at: https://n0-computer.github.io/iroh/pr/3556/docs/iroh/

Last updated: 2025-10-28T04:01:39Z

@github-actions
Copy link

github-actions bot commented Oct 17, 2025

Netsim report & logs for this PR have been generated and is available at: LOGS
This report will remain available for 3 days.

Last updated for commit: 25ce4b2

@n0bot n0bot bot added this to iroh Oct 17, 2025
@github-project-automation github-project-automation bot moved this to 🏗 In progress in iroh Oct 17, 2025
@dignifiedquire dignifiedquire added this to the v0.95 milestone Oct 21, 2025
@ramfox ramfox force-pushed the ramfox/infallible-remote-id branch from ab5d85a to 0bcfe81 Compare October 24, 2025 17:57
“ramfox” added 2 commits October 24, 2025 14:53
- the `ZeroRttConnection` now contains the `ZeroRttAccepted` struct, and `into_connection` waits for the `ZeroRttAccepted` returns, before returning the `Connection` and the result of `ZeroRttAccepted`
@ramfox ramfox self-assigned this Oct 28, 2025
@ramfox ramfox marked this pull request as ready for review October 28, 2025 04:24
@ramfox
Copy link
Member Author

ramfox commented Oct 28, 2025

Still some open questions around the server side of the ZeroRttConnection!

@ramfox ramfox requested a review from matheus23 October 28, 2025 04:25
Comment on lines 144 to 147
let connecting = incoming.accept().e()?;
let (connection, _zero_rtt_accepted) = connecting
let connection = connecting
.into_0rtt()
.expect("accept into 0.5 RTT always succeeds");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do y'all think of having an Incoming::accept_0rtt that doesn't return a Result, thus wouldn't need this annoying .expect("accept into 0.5 RTT always succeeds");?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes! That'd be great

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooo yes.

#[pin]
inner: quinn::Connecting,
ep: Endpoint,
/// exists this is an outgoing connection, `None` if this is an incoming conn
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// exists this is an outgoing connection, `None` if this is an incoming conn
/// `Some(remote_id)` if this is an outgoing connection, `None` if this is an incoming conn

Comment on lines +1540 to +1541
warn!("no peer certificate found");
Err(RemoteEndpointIdSnafu.build())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think it's impossible for this to fail.
There's this issue, but I can't reproduce that.

We should perhaps do a little more testing, but it might be valid to return EndpointId here instead of Result.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the issue that it could possibly fail if you did 0-rtt and you haven't finished the handshake yet? Which we have engineered here to not be the case, true.

At this point in the creation of a connection, is it possible that the downcasts below fail? or that we don't have enough certs? Wouldn't we have already "caught" this if it didn't have the kinds of certificates we were expecting?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much to dig into this! That's what we need to figure out here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the issue that it could possibly fail if you did 0-rtt and you haven't finished the handshake yet?

Hmmm. I think you might be right. Incoming can potentially only be the first datagram of a bunch, and you can get a Connecting from that synchronously, and from there get to quinn::Connection via into_0rtt synchronously as well.
I guess this doesn't really come up because our initial packets pretty much always fit into 1200 byte datagrams, but if for some reason that initial datagram doesn't contain all packets (e.g. because someone uses post-quantum keys), then we're toast.

Comment on lines +2049 to +2051
remote_id: remote_id_from_quinn_conn(&value)
.map_err(|_| ConnectionError::LocallyClosed)?,
alpn: alpn_from_quinn_conn(&value).ok_or(ConnectionError::LocallyClosed)?,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These errors seem a little arbitrary to me. remote_id_from_quinn_conn seems to work no matter if the underlying connection was closed or not.
If alpn_from_quinn_conn would fail, it should probably fail at a different point: On Incoming::accept or when IncomingFuture resolves, we should check if an ALPN is present and fail if not.

And then I think it might be nice to not store the remote_id and alpn twice (once in quinn::Connection and once in iroh::Connection), and instead always look it up via infallible remote_id_from_quinn_conn and alpn_from_quinn_conn versions. (Although this last comment might be a little bit controversial. If you disagree with me I wouldn't push back 😅 )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thank you this answers one of the things that had been bothering me: is it even possible for these conversions to fail at this point. It made sense to me that if the crypto was invalid then how would we even get here?

If that's true, it simplifies a lot more.

Does the remote_id and alpn get stored on the quinn::Connection I thought the whole issue was that we had to do this conversion every time by calling peer_identity & handshake_data

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, yeah. By "get stored" I mean that the bits of information required to derive the ALPN and remote ID are there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 🏗 In progress

Development

Successfully merging this pull request may close these issues.

5 participants