Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dragonfly becomes unresponsive during full sync #4787

Open
arkorwan opened this issue Mar 17, 2025 · 5 comments
Open

Dragonfly becomes unresponsive during full sync #4787

arkorwan opened this issue Mar 17, 2025 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@arkorwan
Copy link

Describe the bug

We run Dragonfly in a 2-node master-replica setup. During full sync, the master node become largely unresponsive.

We have reported this before one year ago: https://dragonfly.discourse.group/t/unresponsive-during-full-sync/135. The issue disappeared in v1.15.0, but we started seeing it again since v1.19.0. Last version we've tried is v1.25.5, still experiencing the problem.

To Reproduce

  1. Prepare two dragonfly instances. (We use two dockerized instances in our minimal reproducible setup)
  2. Put in some sizable number of keys to one instance.
  3. Generate a constant load with a mix of MGET and SET operations.
  4. Start full-sync by making the second instance to be a replica of the first.
  5. Observe throughput drops and response time skyrockets.

We have posted the script to reproduce before in the discourse link.

Expected behavior

Full sync should not have this much impact to the master node.

Screenshots

Version with no issue (1.15.0). Full sync happened right in the middle but it's not really noticeable.
Image

v1.14.5
Image

v1.25.5
Image

Environment (please complete the following information):

  • OS: Ubuntu 22.04.5 LTS
  • Kernel: Linux 5.15.0-134-generic # 145-Ubuntu SMP Wed Feb 12 20:08:39 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
  • Containerized?: Bare Metal
  • Dragonfly Version: seeing problem in v1.19.2, 1.20.1, 1.25.5. Not seeing the problem between v1.15.0 - v1.18.1
@arkorwan arkorwan added the bug Something isn't working label Mar 17, 2025
@romange
Copy link
Collaborator

romange commented Mar 17, 2025

@arkorwan thanks for reporting this but we have not succeeded reproducing this last time we checked. Please provide exact docker-compose setup of your master/replica setup, memtier commands with the instructions on how to reproduce it. Also please attach the info logs of your run.

@romange
Copy link
Collaborator

romange commented Mar 17, 2025

Also can you try running it on 1.27.2 and tell us if it reproduces there?

@romange
Copy link
Collaborator

romange commented Mar 17, 2025

I suspect that this PR is the reason (https://github.com/dragonflydb/dragonfly/pull/3084/files) - it fixes the bug with memory blowout but lets first focus on reproducing it.

@arkorwan
Copy link
Author

Sure. It's not exactly easy to set up as we need to prepare some data but we'll definitely get to that. We've been stuck at 1.18.1 for a while.

@romange
Copy link
Collaborator

romange commented Mar 17, 2025

if you can reproduce it with memtier_benchmark it will be great 🙏🏼

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants