Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server does not close and re-connect on missed keep alive ack #1099

Closed
pstreef opened this issue Dec 19, 2023 · 5 comments · Fixed by #1118
Closed

Server does not close and re-connect on missed keep alive ack #1099

pstreef opened this issue Dec 19, 2023 · 5 comments · Fixed by #1118
Assignees
Labels
bug regression superseded Issue is superseded by another

Comments

@pstreef
Copy link

pstreef commented Dec 19, 2023

Using the server setup found here and using rsocket-core 1.1.3 my server behaves as expected. If for some reason no acks get delivered during the keep-alive timeout it will restart the connection. In 1.1.4 this no longer works.

Expected Behavior

After:

io.rsocket.exceptions.ConnectionErrorException: No keep-alive acks for x ms

I expect that the connection is closed and my client can reconnect.

Actual Behavior

In 1.1.4 the client RSocketRequester does not properly reset the connection and the next call fails with the same ConnectionErrorException

Steps to Reproduce

I've created a small reproducing example. The test in this projects runs green using rsocket-core 1.1.3, but fails when using 1.1.4

Possible Solution

I believe this is regression from this change.

Your Environment

See sample project

  • RSocket version(s) used: 1.1.4 (and 1.1.3)
  • Other relevant libraries versions (eg. netty, ...): spring-boot-starter-rsocket 3.2.0
  • Platform (eg. JVM version (javar -version) or Node version (node --version)): zulu 17.0.8
  • OS and version (eg uname -a): macos 14.2 (23C64)
@rstoyanchev rstoyanchev added this to the 1.1.5 milestone Aug 6, 2024
@david-dvconsulting
Copy link

Hi @rstoyanchev, any updates on the planning/status of this bug ?
We're seeing this a lot in our production environment at the moment.

@kevinat
Copy link

kevinat commented Dec 12, 2024

This bug is serious, it cause we CANNOT recover from temporary network problems, the retry mechanism become useless, the only way to recover is to restart the application, which is woeful.

@kevinat
Copy link

kevinat commented Dec 12, 2024

Any workaround would be appreciated. @rstoyanchev

@rstoyanchev rstoyanchev changed the title Server does not close and re-connect on no keep alive ack issue. Server does not close and re-connect on missed keep alive ack Jan 30, 2025
@rstoyanchev
Copy link
Contributor

rstoyanchev commented Jan 30, 2025

In #1085, RSocketRequester#onClose() was updated to also wait on RSocketResponder to terminate, and to not close the connection from its terminate method. The intent is for cleanup to be completed on both sides before emitting onClose. However, keep-alive is a special case where the connection must be closed proactively by RSocketRequester, which handles connection-level frames, or it won't close otherwise.

The fix we've identified with @OlegDokuka is to close the connection in tryTerminateOnKeepAlive, and that makes the scenario in the provided test work again.

Note that as the test blocks the Reactor Netty thread in doOnNext, that delays the onClose() notification in the latest code from being emitted until the Thread.sleep is done. That means the reconnect happens 1 second after the first request, and the next request is made around the same time. To make the test work reliably, we increased the request interval to be greater than the sleep duration:

StepVerifier.create(
    Flux.range(0, expectedCount)
        .delayElements(Duration.ofMillis(2000))  // <<== change here from 1000 to 2000
        .flatMapSequential(i ->
            rsocketClient.requestResponse(Mono.just(DefaultPayload.create("")))
                .doOnNext(message -> {
                  if (sleepOnce.getAndSet(false)) {
                    try {
                      LOG.info("Sleeping...");
                      Thread.sleep(1_000);
                      LOG.info("Waking up.");
                    } catch (InterruptedException e) {
                      throw new RuntimeException(e);
                    }
                  }
                })
        ))
        .expectSubscription()
        .expectNextCount(expectedCount)
        .verifyComplete();

@rstoyanchev
Copy link
Contributor

Superseded by #1118.

@rstoyanchev rstoyanchev closed this as not planned Won't fix, can't repro, duplicate, stale Jan 31, 2025
@rstoyanchev rstoyanchev removed this from the 1.1.5 milestone Jan 31, 2025
@rstoyanchev rstoyanchev added the superseded Issue is superseded by another label Jan 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug regression superseded Issue is superseded by another
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants