Skip to content

Conversation

@MasterPtato
Copy link
Contributor

No description provided.

@vercel
Copy link

vercel bot commented Nov 22, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
rivetkit-serverless Error Error Nov 22, 2025 2:45am
3 Skipped Deployments
Project Deployment Preview Comments Updated (UTC)
rivet-cloud Ignored Ignored Nov 22, 2025 2:45am
rivet-inspector Ignored Ignored Nov 22, 2025 2:45am
rivet-site Ignored Ignored Nov 22, 2025 2:45am

Copy link
Contributor Author

MasterPtato commented Nov 22, 2025

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link

claude bot commented Nov 22, 2025

PR Review: fix: make all uses of protocol handle v4 <-> v3

Summary

This PR introduces protocol version handling to support both mk1 (v1-v3) and mk2 (v4) protocol versions across the Pegboard system. The changes ensure backward compatibility when communicating between gateways, runners, and clients using different protocol versions.


Code Quality & Best Practices ✅

Strengths:

  • Clear separation between mk1 and mk2 protocol handling with explicit version checks using protocol::is_mk2()
  • Well-organized versioned data structures in versioned.rs with comprehensive conversion functions between all protocol versions
  • Consistent use of PROTOCOL_MK1_VERSION and PROTOCOL_MK2_VERSION constants instead of magic numbers
  • Good use of Rust's type system to enforce correct protocol handling at compile time

Minor Suggestions:

  1. In engine/packages/pegboard/src/workflows/actor/mod.rs:323-338, there are multiple TODO comments for mk2:

    if protocol::is_mk2(runner_protocol_version) {
        // TODO: Send message to tunnel
    } else {

    These TODOs appear at lines 323, 358, 505, 523, and 651. Should these be implemented before merging, or tracked as follow-up work?

  2. The Event type in actor_event_demuxer.rs now uses protocol::mk2::Event exclusively. Verify this is intentional and that mk1 events are properly converted before reaching this code path.


Potential Bugs or Issues ⚠️

  1. Protocol Version Default Handling (engine/packages/pegboard/src/workflows/actor/mod.rs:779-781):

    pub struct Allocate {
        #[serde(default)]
        pub runner_protocol_version: Option<u16>,
    }

    When runner_protocol_version is None, ensure the code handles this gracefully. In the lifecycle loop, there's a guard:

    let (Some(runner_id), Some(runner_workflow_id), Some(runner_protocol_version)) = ...

    But earlier code paths may not have this protection.

  2. Error Silencing (tunnel_to_ws_task.rs:84-88):

    Err(err) => {
        tracing::error!(?err, "failed to parse tunnel message");
        return Ok(());
    }

    Silently returning Ok(()) on parse errors could hide protocol compatibility issues. Consider adding a metric or more context about which protocol version was expected vs received.

  3. Gateway-to-Runner Protocol Mismatch (shared_state.rs:185-203):
    The gateway reads runner_protocol_version from UDB and uses it to serialize messages. If there's a race condition where the runner reconnects with a different protocol version, messages could be serialized incorrectly. Consider if staleness of this value could cause issues.


Performance Considerations 📊

  1. Database Read Per Request (lib.rs:156-172):

    let (mut stopped_sub, runner_protocol_version) = tokio::try_join!(
        self.ctx.subscribe::<...>(...),
        udb.run(|tx| async move {
            tx.read(&pegboard::keys::runner::ProtocolVersionKey::new(runner_id), ...)
        })
    )?;

    This reads the protocol version from UDB for every HTTP/WebSocket request. Consider caching this value in SharedState or InFlightRequest after the first lookup for a given runner.

  2. Version Conversion Overhead: The conversion functions in versioned.rs (e.g., to_client_tunnel_message_v4_to_v3) allocate new structs. For high-throughput message passing, this could add GC pressure. This is likely acceptable but worth monitoring.


Security Concerns 🔒

No significant security concerns identified. The protocol versioning doesn't introduce new attack surfaces, and the conversion functions properly handle all message types without data loss.


Test Coverage 📝

The PR doesn't appear to include new tests for the protocol conversion logic. Consider adding:

  • Unit tests for versioned.rs conversion functions (especially edge cases like DeprecatedTunnelAck conversion)
  • Integration tests verifying mk1 clients can communicate with mk2 runners and vice versa
  • Tests for protocol version boundary conditions

Additional Notes

  1. TypeScript Changes (engine/sdks/typescript/runner-protocol/src/index.ts):
    The TypeScript SDK also needs updates to handle the new protocol. Verify the runner_protocol_version field is properly exposed and the checkpoint structure changes are reflected.

  2. Schema Changes (engine/sdks/schemas/runner-protocol/v4.bare):
    The v4 schema changes look correct. The addition of checkpoint field to CommandWrapper and actor_id removal from CommandStartActor is a clean restructuring.

  3. Naming Consistency: The PR uses both "mk1/mk2" and "v1-v4" naming. Consider documenting the relationship (mk1 = v1-v3, mk2 = v4) in a code comment for future maintainers.


Verdict

This is a solid implementation of protocol version handling with good backward compatibility design. The main concerns are:

  1. TODO comments - Are these blocking for this PR?
  2. Missing tests - Would be good to add protocol conversion tests
  3. Performance - Consider caching protocol version lookups

🤖 Generated with Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants