-
Notifications
You must be signed in to change notification settings - Fork 45
Improve compaction task status updates #241
base: master
Are you sure you want to change the base?
Conversation
src/couch_db_updater.erl
Outdated
|
||
emsort_cb(_Ems, {merge, chain}, {init, Copied, Nodes}) -> | ||
{init, Copied, Nodes + 1}; | ||
emsort_cb(_Ems, row_copy, {init, Copied, Nodes}) when Copied >= 1000 -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Magic constant 1000.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you mean you'd rather that be a define? I just copied the same shape as merge_docids down below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I mean that it would be better to have a define. Like ?UPDATE_FREQ or ?BATCH_SIZE or something.
Also I need to change my use of DocCount to be TotalChanges. I noticed while testing the optimizations I tried that its currently not correct during compaction retries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good but I have few questions.
src/couch_db_updater.erl
Outdated
]), | ||
0; | ||
|
||
emsort_cb(_Ems, row_copy, Copied) when is_integer(Copied), Copied > 1000 -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Magic constant 1000.
{init, 0, Nodes}; | ||
emsort_cb(_Ems, row_copy, {init, Copied, Nodes}) -> | ||
{init, Copied + 1, Nodes}; | ||
emsort_cb(Ems, {merge_start, reverse}, {init, Copied, Nodes}) -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that we can also have {merge_start, forward}
event. Should we handle it as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we wouldn't have a progress updates for the second pass. Is it intentional?
Previous the emsort related operations did not update the compaction task status. For large databases this leads to some very long waits while the compaction task stays at 100%. This change adds progress reports to the steps for sorting and copying document ids back into the database file.
1db1337
to
ae00a5a
Compare
Updated to use define's for batch sizes and fixed the changes/doc count mixup. For the {merge_start, forward}, thats only there for completeness. It felt a bit weird to not include that event even though I don't necessarily need it for this work. Updates during the second phase happen here: https://github.com/apache/couchdb-couch/pull/241/files#diff-f6f654ab26b490bab95be6f502c49d89R1376 This is a bit subtle but it seemed like the best I could do without starting to modify the on-disk contents of files which I like to avoid when at all possible. Its a bit funky but without starting to store the total number of rows in emsort the best approach I had was to guess with the input being the total number of changes processed in this run and then in the first phase of the merge sort just count how many rows we have. Once that first pass is over we can calculate the total number of rows that will be copied and then just update progress in the normal fashion. Its a bit awkward but that emsort_cb kind of switches modes once it sees that the first phase has ended and then its purely just listening for row_copy events. |
And I hesitate to store the total number of rows because there'd then be upgrade things to worry about and generally speaking this will just sort itself out while still giving an approximate progress update. |
Previous the emsort related operations did not update the compaction
task status. For large databases this leads to some very long waits
while the compaction task stays at 100%. This change adds progress
reports to the steps for sorting and copying document ids back into the
database file.