- 
                Notifications
    You must be signed in to change notification settings 
- Fork 33
Description
The problem
Rebalancing was started, after some time the replication got broken with the following error:
 
How it happened
This happened as follows: we had a master, which processed some RW requests, after that the master was changed (which is pretty casual thing, when rebalancing is in progress due to high load on storages), then a new master started sending bucket, but replica has RW ref on it, this breaks replication and requires manual intervention from the user (replication becomes stopped and must be manually restarted)
What to do
- 
Maybe we can drop rw refs as soon as master becomes replica. However, there may be RW request in progress, which created RW ref, then yielded, master changed, we dropped the ref, then the master is changed back, and we have working RW request without ref, which is dangerous 
- 
We can wait for RW refs to become 0 before sending the buckets to the new replicaset. Smth similar to the implementation of the map_callro. The problem with that solution is that in map_callrowe need to wait after making the bucketSENDING, here we need to wait before any actions. So, there will be sync for RW refs before making bucketSENDINGand sync for RO refs after.
- 
That's the problem we created ourselves by introducing the checks of the bucket states and we can also fix it by relaxing these checks