-
Notifications
You must be signed in to change notification settings - Fork 33
Improve logging of rebalancer and recovery #586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Improve logging of rebalancer and recovery #586
Conversation
6b1057d
to
64cc837
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the comments for the first two commits, more comments are coming later) Thank you for working on this, good logging is crucial and allows us to investigate, what happened during incidents
5a8b3f8
to
f5c25f7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, shit. I forgot to send the last message of review, I'm very sorry
04c506f
to
ccff54f
Compare
46add65
to
a1c095b
Compare
a1c095b
to
1da8c2c
Compare
1da8c2c
to
f07abe5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the final comments I have, the patch is pretty clean now)
Before this patch "Finish bucket recovery step ..." logs were printed at the end of recovery even if no buckets were successfully recovered. It led to unnecessary log records. This patch fixes the issue by adding an additional check for the number of recovered buckets. Part of tarantool#212 NO_DOC=bugfix
This patch introduces logging of buckets' ids which were recovered during recovery stage of storage. Part of tarantool#212 NO_DOC=bugfix
871197d
to
489b425
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Final nits
end) | ||
t.assert(g.replica_1_a:grep_log( | ||
'Apply rebalancer routes with 1 workers')) | ||
end) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: the indent is not correct here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed. I also changed indents on 211
-214
lines.
vshard/storage/init.lua
Outdated
end | ||
log.info('Rebalance routes are sent. Schedule next wakeup after '.. | ||
'%f seconds', consts.REBALANCER_WORK_INTERVAL) | ||
log.info('Next rebalancer routes were sent: %s. Schedule next ' .. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: sounds incorrect gramatically) Let's better say The following rebalancer routes were sent
, or you can just leave as it was in order not to change the existing tests)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
g.replica_2_a:replicaset_uuid()) | ||
t.assert(g.replica_1_a:grep_log(rebalancer_routes_msg)) | ||
start_bucket_move(g.replica_1_a, g.replica_2_a, moved_bucket_from_2) | ||
start_bucket_move(g.replica_1_a, g.replica_3_a, moved_bucket_from_3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're moving the buckets with rebalancer, why do you need to manually move them then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
This patch adds rebalancer routes' logging. The log file now includes information about the source storage, the number of buckets, and the destination storage where the buckets will be moved. Since the rebalancer service has changed logging of routes that were sent, we change the `rebalancer/rebalancer.test.lua` and `rebalancer/stress_add_remove_several_rs.test.lua` tests. Part of tarantool#212 NO_DOC=bugfix
Before this patch the function `rebalancer_download_states` didn't return information about replicaset from which the states could not be downloaded. As a result, the log "Some buckets are not active ..." lacks of valuable information about unhealthy replicaset. Now, we return `(replicaset.id, nil)` instead of `nil` in case when rebalancer can't download state from this replicaset. Also we add replicaset.id in "Some buckets are not active ..." log. Also we change `rebalancer/rebalancer.test.lua` test which expected the old "Some buckets are not active" log without replicaset.id. Closes tarantool#212 NO_DOC=bugfix
489b425
to
fce8f28
Compare
Before this patch "Finish bucket recovery step ..." logs were printed at
the end of recovery even if no buckets were successfully recovered, it led
to unnecessary log entries. This patch fixes the issue by adding an
additional check for the number of recovered buckets.
Closes #212
NO_DOC=bugfix