MRP protocol state machine issues #22

lhoward · 2025-12-01T04:59:59Z

We are seeing a number of state machine issues with stream persistence in the following topology:

XMOS listener <-> OpenSRP <-> Extreme x460 <-> MOTU 16A (2025)

This PR tracks fixes for them.

The talker propagation logic in _onRegisterStreamIndication read port state once at the beginning of the apply loop and then used that snapshot across multiple await suspension points where concurrent operations could modify the underlying state. This was particularly problematic for _canBridgeTalker, which makes admission control decisions based on bandwidth calculations. This change addresses the race condition by reading port state twice: once for the initial pruning check (where a snapshot is acceptable) and again immediately before _canBridgeTalker (to ensure bandwidth calculations use current state), reducing the window for stale data to cause incorrect admission control decisions.

* when leaveNow() is called, it should behave as if the leavetimer has immediately expired (35.2.6) * always GC attributes after running the state machine handler * _findRegisteredAttributes() returns an array of registered attributes matching the specified filter

The previous implementation used a continuously running periodic join timer that transmitted PDUs multiple times per second regardless of state changes. Note: this removes the point-to-point optimization in the Participant state machine. We will add this back later.

…teners When a TalkerFailed attribute is received from an upstream bridge and local listeners exist, the merged listener declaration must be listenerAskingFailed regardless of the current listener states.

this can only happen if a Talker Failed was replaced with a Talker Advertise, and we will always update with the most recent state; this avoids needlessly updating the listener ports

The _updateExistingListeners function accommodates the race condition where listeners arrive before talkers, but it did not guard against the opposite race where a talker arrives and then immediately departs before port parameters can be updated.

The mutual exclusion logic for ensuring only one talker attribute type (talkerAdvertise or talkerFailed) exists per stream appeared in two locations with nearly identical code: once when handling peer events and once when propagating to other ports. This change extracts the common pattern into a private _enforceTalkerMutualExclusion helper function that handles both deregister (for peer events) and leave (for propagation) operations.

The _findTalkerRegistration(for:) method that searches across all participants was declared as throwing but never actually throws any errors, instead returning nil when no talker is found.

lhoward force-pushed the x460-interop-issue branch 4 times, most recently from 8702180 to 5924f73 Compare December 1, 2025 06:13

lhoward force-pushed the main branch from 8799660 to 5c7b31c Compare December 1, 2025 06:16

lhoward force-pushed the x460-interop-issue branch 3 times, most recently from bcb7247 to 33368db Compare December 1, 2025 22:38

lhoward self-assigned this Dec 1, 2025

lhoward added the bug Something isn't working label Dec 1, 2025

lhoward added this to the firsttenbeta milestone Dec 1, 2025

lhoward force-pushed the x460-interop-issue branch 17 times, most recently from 06c1b4a to f0241d7 Compare December 2, 2025 06:03

lhoward force-pushed the main branch from d76cbce to 7dfebf1 Compare December 2, 2025 06:26

lhoward force-pushed the x460-interop-issue branch from f0241d7 to 594e098 Compare December 2, 2025 06:28

lhoward force-pushed the x460-interop-issue branch 10 times, most recently from a496e87 to 3a9288b Compare December 6, 2025 08:54

lhoward force-pushed the main branch from 7e48cd8 to 17c6abd Compare December 6, 2025 08:58

lhoward added 17 commits December 6, 2025 20:06

MSRP: guard against time going backwards when tracking stream age

6d61910

MRP: application event handlers should run before registrar SM

b445087

MRP: ensure rLA state machine run after processing inbound LA PDU

1035eb5

MRP: isolate registrar and applicant handlers to participant's actor

685b3f2

MRP: add precondition to check leaveNow deregisters attribute

c3d67b1

MRP: re-add attribute subtype replacement, but in rx()

809da59

MSRP: refactor talker complement deregistration

94cd34a

MSRP: always propagate listenerAskingFailed for talkerFailed with lis…

a19721d

…teners When a TalkerFailed attribute is received from an upstream bridge and local listeners exist, the merged listener declaration must be listenerAskingFailed regardless of the current listener states.

MSRP: no talker registration in listener propagation is not an error

aa5f97d

MSRP: ignore application event source in talker leave indications

5b5d975

this can only happen if a Talker Failed was replaced with a Talker Advertise, and we will always update with the most recent state; this avoids needlessly updating the listener ports

MSRP/MRP: remove application event handlers

e839dbe

MSRP: remove unnecessary throws declaration from _findTalkerRegistration

51af223

The _findTalkerRegistration(for:) method that searches across all participants was declared as throwing but never actually throws any errors, instead returning nil when no talker is found.

lhoward force-pushed the x460-interop-issue branch from 3a9288b to 51af223 Compare December 6, 2025 09:08

lhoward mentioned this pull request Dec 6, 2025

Validate port parameters updated after restart #12

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MRP protocol state machine issues #22

MRP protocol state machine issues #22

lhoward commented Dec 1, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MRP protocol state machine issues #22

Are you sure you want to change the base?

MRP protocol state machine issues #22

Conversation

lhoward commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lhoward commented Dec 1, 2025 •

edited

Loading