Skip to content

Consider ditching concurrent handler execution #284

@ebkalderon

Description

@ebkalderon

Background

In order for the framework to support both standard streams and TCP transports generically, we need a way to interleave both client-to-server and server-to-client communication over a single stream. This topic has been the subject of a previous issue. The current strategy used by tower-lsp (and lspower in turn) to accomplish this is somewhat simplistic:

  1. Incoming messages (requests to server, responses from client) are read sequentially from the input stream
  2. Each message is routed to its respective async handler
  3. Pending tasks are buffered and executed concurrently on a single thread, maximum four (4) tasks at a time, preserving order
  4. Outgoing messages (responses from server, requests from client) are serialized into the output stream

The third step above is particularly significant, however, and has some unintended consequences.

Problem Summary

A closer reading of the "Request, Notification and Response Ordering" section of the Language Server Protocol specification reveals:

Responses to requests should be sent in roughly the same order as the requests appear on the server or client side. [...] However, the server may decide to use a parallel execution strategy and may wish to return responses in a different order than the requests were received. The server may do so as long as this reordering doesn’t affect the correctness of the responses.

This is concerning to me because tower-lsp unconditionally executes pending async tasks concurrently without any regard for the correctness of the execution. The correct ordering of the outgoing messages is preserved, as per the spec, but the execution order of the handlers is not guaranteed to be correct. For example, an innocent call to self.client.log_message().await inside one server handler might potentially prompt the executor to not immediately return back to that handler's yield point, but instead start processing the next incoming request concurrently as it becomes available.

As evidenced by downstream GitHub issues like denoland/deno#10437, such behavior can potentially cause the state of the server and client to slowly drift apart as many small and subtle errors accumulate and compound over time. This problem is exacerbated by LSP's frequent use of JSON-RPC notifications rather than requests, which don't require waiting for a response from the server (see relevant discussion in microsoft/language-server-protocol#584 and microsoft/language-server-protocol#666).

It's not really possible to confidently determine whether any particular "state drift" bug was caused by the server implementation of tower-lsp without stepping through with a debugger. However, there are some things we can do to improve the situation for our users and make such bugs less likely.

Possible solutions

For example, we could process client-to-server and server-to-client messages concurrently to each other, but individual messages of each type must execute in the order they are received, one by one, and no concurrency between individual LanguageServer handlers would be allowed.

This would greatly decrease request throughput, however. Perhaps it would be beneficial to potentially allow some handlers to run concurrently where it's safe (with user opt-in or opt-out), but otherwise default to fully sequential execution. To quote the "Request, Notification and Response Ordering" section of the Language Server Protocol specification again:

For example, reordering the result of textDocument/completion and textDocument/signatureHelp is allowed, as these each of these requests usually won’t affect the output of the other. On the other hand, the server most likely should not reorder textDocument/definition and textDocument/rename requests, since the executing the latter may affect the result of the former.

If choose to go this route, we may have to determine ahead of time which handlers are usually safe to reorder and interleave and which are not. I imagine this would introduce a ton of additional complexity to tower-lsp, though, so I'd rather leave such responsibilities to the server author to implement themselves, if possible.

Below are a few key takeaways that we can start adding now to improve the situation:

  1. Execute server request/notification handlers sequentially in the order in which they were received, no concurrency.
  2. Process client-to-server and server-to-client messages concurrently with each other.

The matters of optional user-controlled concurrency are still unanswered, however, and will likely require further design work.

Any thoughts on the matter, @silvanshade?

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions