Skip to content

Conversation

Edison-A-N
Copy link

Fix Race Condition in StreamableHTTP Transport (Closes #1363)

Motivation and Context

Starting from v1.12.0, MCP servers in HTTP Streamable mode experience a race condition that causes ClosedResourceError exceptions when requests fail validation early (e.g., due to incorrect Accept headers). This issue affects server reliability and can be reproduced consistently with fast-failing requests.

The race condition occurs because:

  1. Message router enters async for write_stream_reader loop
  2. write_stream_reader calls checkpoint() in receive(), yielding control
  3. Request validation fails early and returns immediately
  4. Transport termination closes all streams including write_stream_reader
  5. Message router resumes and encounters closed stream, raising ClosedResourceError

This fix ensures graceful handling of stream closure scenarios without propagating exceptions that could destabilize the server.

How Has This Been Tested?

Test Suite

Added comprehensive test suite in tests/issues/test_1363_race_condition_streamable_http.py that reproduces the race condition:

  1. Invalid Accept Headers Test:

    • Missing application/json in Accept header
    • Missing text/event-stream in Accept header
    • Completely invalid Accept header
  2. Invalid Content-Type Test:

    • Incorrect Content-Type header
  3. Log Analysis:

    • Captures server logs from separate process
    • Verifies no ClosedResourceError exceptions occur
    • Checks for "Error in message router" messages
    • Validates graceful error handling

Test Execution

  • Tests run in isolated processes to capture real server behavior
  • Server runs in stateless mode to trigger race condition
  • Multiple request scenarios tested to ensure comprehensive coverage
  • Log analysis confirms fix prevents exception propagation

Breaking Changes

None. This is a bug fix that maintains full backward compatibility.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

Implementation Details

The fix adds explicit exception handling for anyio.ClosedResourceError in the message router loop:

except anyio.ClosedResourceError:
    if self._terminated:
        logging.debug("Read stream closed by client")
    else:
        logging.exception("Unexpected closure of read stream in message router")

This approach:

  • Graceful Handling: Prevents exception propagation that could crash the server
  • Smart Logging: Distinguishes between expected termination and unexpected closure
  • Minimal Impact: No performance overhead or behavioral changes
  • Robust: Handles the race condition without complex synchronization

Related Issues

@Edison-A-N Edison-A-N requested a review from a team as a code owner September 21, 2025 06:12
@maxisbey maxisbey added the bug Something isn't working label Sep 22, 2025
@thomasst
Copy link

This seems to silence the error. Is this the correct approach given that for me (and others in #1219 / #1190) the error happens on every request, so it doesn't appear to just be a race condition?

@Edison-A-N
Copy link
Author

Hi,

In anyio's Implementation

1. Conditions for Iteration Termination

Class inheritance:

MemoryObjectReceiveStream -> ObjectReceiveStream -> UnreliableObjectReceiveStream

As we can see in the implementation of UnreliableObjectReceiveStream.__anext__:

async def __anext__(self) -> T_co:
    try:
        return await self.receive()
    except EndOfStream:
        raise StopAsyncIteration from None

That is, the EndOfStream exception will terminate the iteration.

2. When to Raise EndOfStream or ClosedResourceError

MemoryObjectReceiveStream.receive -> receive_nowait:

def receive_nowait(self) -> T_co:
    """
    Receive the next item if it can be done without waiting.

    :return: the received item
    :raises ~anyio.ClosedResourceError: if this send stream has been closed
    :raises ~anyio.EndOfStream: if the buffer is empty and this stream has been
        closed from the sending end
    :raises ~anyio.WouldBlock: if there are no items in the buffer and no tasks
        waiting to send
    """

All ClosedResourceError exceptions are based on this check:

if self._closed:
    raise ClosedResourceError

And of course, self._closed becomes True originates from:

class MemoryObjectReceiveStream:
    ...
    ...
    def close(self) -> None:
        """
        Close the stream.

        This works the exact same way as :meth:`aclose`, but is provided as a special
        case for the benefit of synchronous callbacks.

        """
        if not self._closed:
            self._closed = True
            self._state.open_receive_channels -= 1
            if self._state.open_receive_channels == 0:
                send_events = list(self._state.waiting_senders.keys())
                for event in send_events:
                    event.set()

Review of Known Issues

In issue #1219, the debug information clearly shows _closed = True (visible in the debug for : second screenshot).

The traceback in issue #1190 also lists the root cause of the error. It occurs when if self._closed is True.

In fact, looking at the anyio implementation above, it's very clear that ClosedResourceError is raised because the stream has been closed.

Why This Implementation is Appropriate

This implementation is not "silencing the error". In fact, in scenarios where multiple coroutines operate on the same stream simultaneously, checking whether the stream has been closed is a necessary operation. Since anyio.MemoryObjectReceiveStream chooses to raise ClosedResourceError rather than support automatic iteration termination, we need to actively check during for loop iteration.

When checking externally, we simultaneously check self._terminated, which ensures that external calls know whether the ClosedResourceError exception is due to active closure. If not, it still outputs logger.exception to the logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Race Condition in StreamableHTTP Transport Causes ClosedResourceError
3 participants