Skip to content

Conversation

@lhoward
Copy link
Contributor

@lhoward lhoward commented Dec 1, 2025

We are seeing a number of state machine issues with stream persistence in the following topology:

XMOS listener <-> OpenSRP <-> Extreme x460 <-> MOTU 16A (2025)

This PR tracks fixes for them.

@lhoward lhoward force-pushed the x460-interop-issue branch 4 times, most recently from 8702180 to 5924f73 Compare December 1, 2025 06:13
@lhoward lhoward force-pushed the x460-interop-issue branch 3 times, most recently from bcb7247 to 33368db Compare December 1, 2025 22:38
@lhoward lhoward self-assigned this Dec 1, 2025
@lhoward lhoward added the bug Something isn't working label Dec 1, 2025
@lhoward lhoward added this to the firsttenbeta milestone Dec 1, 2025
@lhoward lhoward force-pushed the x460-interop-issue branch 17 times, most recently from 06c1b4a to f0241d7 Compare December 2, 2025 06:03
@lhoward lhoward force-pushed the x460-interop-issue branch from f0241d7 to 594e098 Compare December 2, 2025 06:28
@lhoward lhoward force-pushed the x460-interop-issue branch 10 times, most recently from a496e87 to 3a9288b Compare December 6, 2025 08:54
The talker propagation logic in _onRegisterStreamIndication read port state
once at the beginning of the apply loop and then used that snapshot across
multiple await suspension points where concurrent operations could modify the
underlying state.

This was particularly problematic for _canBridgeTalker, which makes admission
control decisions based on bandwidth calculations. This change addresses the
race condition by reading port state twice: once for the initial pruning check
(where a snapshot is acceptable) and again immediately before _canBridgeTalker
(to ensure bandwidth calculations use current state), reducing the window for
stale data to cause incorrect admission control decisions.
* when leaveNow() is called, it should behave as if the leavetimer has
  immediately expired (35.2.6)
* always GC attributes after running the state machine handler
* _findRegisteredAttributes() returns an array of registered attributes
  matching the specified filter
The previous implementation used a continuously running periodic join timer
that transmitted PDUs multiple times per second regardless of state changes.

Note: this removes the point-to-point optimization in the Participant state
machine. We will add this back later.
…teners

When a TalkerFailed attribute is received from an upstream bridge and local
listeners exist, the merged listener declaration must be listenerAskingFailed
regardless of the current listener states.
this can only happen if a Talker Failed was replaced with a Talker Advertise,
and we will always update with the most recent state; this avoids needlessly
updating the listener ports
The _updateExistingListeners function accommodates the race condition where
listeners arrive before talkers, but it did not guard against the opposite race
where a talker arrives and then immediately departs before port parameters can
be updated.
The mutual exclusion logic for ensuring only one talker attribute type
(talkerAdvertise or talkerFailed) exists per stream appeared in two locations
with nearly identical code: once when handling peer events and once when
propagating to other ports. This change extracts the common pattern into a
private _enforceTalkerMutualExclusion helper function that handles both
deregister (for peer events) and leave (for propagation) operations.
The _findTalkerRegistration(for:) method that searches across all participants
was declared as throwing but never actually throws any errors, instead
returning nil when no talker is found.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants