-
Notifications
You must be signed in to change notification settings - Fork 32
Support cluster purge #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
chguocloudant
commented
Nov 30, 2016
- add 2 APIs(set_purge_seq, get_purge_seq).
- add a test case to test the new APIs.
|
where is purgeSeq persisted to the index? |
| var reader = DirectoryReader.open(ctx.args.writer, true) | ||
| var updateSeq = getCommittedSeq | ||
| var pendingSeq = updateSeq | ||
| var purgeSeq = 0L |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the default value is 0? Similarly to updateSeq it should be read from commitData on disk
| } | ||
| } | ||
|
|
||
| private def commitPurgeSeq(newSeq: Long) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method does not set the committing flag and so could allow concurrent attempts to commit, leading to a broken index (one with the wrong values).
This work should instead be done in the preceding commit function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean we can commit purgeSeq and updateSeq with commit function?
We can change commit function with 2 arguments to support it.
private def commit(newSeq: Long, type: String) {
if (type == "updateSeq") {
...
} else {
...
}
- add 2 APIs(set_purge_seq, get_purge_seq). - keep purge_seq in clouseau. - adjust the commit logic to update purge seq. - update the test code. BugzID: 68280
86f9452 to
a8c3c8e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved, nicely done. Please don't merge until we have the matching dreyfus work at the same level in case it changes what we need here.
| ('pending_seq, pendingSeq), | ||
| ('committed_seq, getCommittedSeq) | ||
| ('committed_seq, getCommittedSeq), | ||
| ('purge_seq, pendingPurgeSeq) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be purgeSeq here not pendingPurgeSeq.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I will change it later.
| case SetPurgeSeqMsg(newSeq: Long) => | ||
| pendingPurgeSeq = newSeq | ||
| logger.debug("Pending purge sequence is now %d".format(newSeq)) | ||
| commit(pendingSeq, pendingPurgeSeq) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @rnewson .
I triggered commit when the purgeSeq is updated. Will it cost too much?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I think the var pendingPurgeSeq can be removed if the commit is allowed. Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, good question. I'd rather we always commit asynchronously and on the established interval. For users that purge a lot (and there will be a lot of them if this work is extended to allow 'hard' deletes) they'll be committing very frequently and this will hurt performance.
This means the purge seq message could be processed but not actioned, when clouseau crashes before committing, so we'll need to ensure that dreyfus will retry (it should, since it reads the purge seq from the index, but let's be sure).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @rnewson .
If we commit purgeSeq asynchronously and also the users purge a lot, the dreyfus will retry to delete the index again in clouseau.
For example, if dreyfus gets a purgeSeq(6) from couchdb and gets an IdxPurgeSeq(5) from clouseau, it will purge index from 5 and set purgeSeq to 6.
If clouseau is crashed before committing, dreyfus will get purgeSeq(5) again. And dreyfus will purge index from 5 again.
But I am not sure which is more expensive, delete index or commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can safely assume clouseau crashes are rare from our production experience.
commit is expensive, the most expensive operation there is in Lucene, I think.
So the choice is between an expensive commit for every purge or a less expensive delete on retry for every clouseau crash.
We should also not say "delete index" when we really mean "delete document". :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I think I can modify it later.
Let me make sure that we can remove the value pendingPurgeSeq, right? If the dreyfus calls get_purge_seq, clouseau will return purgeSeq directly(now this value maybe is not committed yet).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how we can remove pending. we need to return the committed purge seq when asked and need to retain the pending value for when we commit.
- adjust the commit logic to update purge seq.
BugzID: 68280
| logger.debug("Pending sequence is now %d".format(newSeq)) | ||
| 'ok | ||
| case SetPurgeSeqMsg(newPurgeSeq: Long) => | ||
| purgeSeq = newPurgeSeq |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be pendingPurgeSeq = newPurgeSeq. purgeSeq must only change after it's been committed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I will modify it.
- use pendingPurgeSeq in set_purge_seq and get_purge_seq. - update purgeSeq when pengdingPurgeSeq is committed. BugzId: 68280
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 but let's merge once the matching dreyfus PR is ready.
| 'ok | ||
| case SetPurgeSeqMsg(newPurgeSeq: Long) => | ||
| pendingPurgeSeq = newPurgeSeq | ||
| logger.debug("purge sequence is now %d".format(newPurgeSeq)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should say 'Pending purge sequence is now %d'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, got it. I will modify it later.