Skip to content

RW refs break replication when master changes #573

@Serpentian

Description

@Serpentian

The problem

Rebalancing was started, after some time the replication got broken with the following error:

Image

How it happened

This happened as follows: we had a master, which processed some RW requests, after that the master was changed (which is pretty casual thing, when rebalancing is in progress due to high load on storages), then a new master started sending bucket, but replica has RW ref on it, this breaks replication and requires manual intervention from the user (replication becomes stopped and must be manually restarted)

What to do

  1. Maybe we can drop rw refs as soon as master becomes replica. However, there may be RW request in progress, which created RW ref, then yielded, master changed, we dropped the ref, then the master is changed back, and we have working RW request without ref, which is dangerous

  2. We can wait for RW refs to become 0 before sending the buckets to the new replicaset. Smth similar to the implementation of the map_callro. The problem with that solution is that in map_callro we need to wait after making the bucket SENDING, here we need to wait before any actions. So, there will be sync for RW refs before making bucket SENDING and sync for RO refs after.

  3. That's the problem we created ourselves by introducing the checks of the bucket states and we can also fix it by relaxing these checks


@Gerold103, @mrForza

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingcriticalThe issue is critical and should be fixed ASAPcustomerrebalancer

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions