-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Enversion tracks roots by way of an evn:roots revision property, set on each revision during post-commit. At the start of processing, the previous revision's evn:roots are loaded in their entirety. As the commit is processed, the necessary root modifications are made, and then the entire structure is written back out to that revision's revprop.
This has worked well to date, but it does not scale well with large repositories that have a lot of roots. For example, I kicked off an analysis of the Apache asf repo after I released v0.2.22 (which had numerous improvements to root ancestor logic). I had to cancel it 8.3 days later -- it had made it to revision 283059, but analysis had slowed significantly -- each revision was taking ~4-5 seconds to analyze, and it was slowly getting slower. du -hs asf/db/revprops revealed 8.7GB, and evnadmin show-roots asf | wc -c indicated about 250KB.
I propose to enhance this behavior as follows: introduce an evn:prev_roots and evn:next_roots revprop. When processing a new revision, rather than loading evn:roots from revision-1, Enversion will look at the evn:prev_roots property, which will be a revision number, and then load the roots from that revision instead.
If the commit alters the roots -- write the new roots out in full, set the evn:next_roots revprop on the evn:prev_roots revision revprop to this revision (i.e. add a forward link), and set our evn:prev_roots to the current revision number.
If the commit does not alter any roots, it "forward copies" the evn:prev_roots property to the revision instead of the entire evn:roots structure. With this approach, the appropriate roots can be loaded for any revision by looking up the value of evn:roots at the revision pointed to be evn:prev_roots.
The implementation should also support upgrading existing repositories that use evn:roots on every revision: an initial pass should be made to simply record the relevant evn:prev_roots/evn:next_roots entries.
Once that has completed, the evn:version number (on revprop r0) will be bumped to 2, which will tell Enversion to use the new root prev/next logic for loading roots during pre/post commit hooks. Then, another pass can be done on the repository, and for each revision, if evn:prev_roots does not equal that revision number, then the evn:roots haven't changed and thus, can be deleted safely.
Using this two pass approach will allow repositories to be upgraded without any downtime (i.e. there will be no need to set the repository read-only whilst the passes take place).