Skip to content

Conversation

@Ostrzyciel
Copy link
Contributor

GitHub issue resolved: #5291

Second third take at solving #5219, after #5446 and #5473. The original idea was to wrap Lucene's Directory into a class that would defer fsyncs and then execute them on a timer. This would allow us to keep full transactional support, and also reap the benefits of improved performance.

While that solution seemed to work initially, now I'm pretty sure it's impossible to implement in a 100% correct manner that would be reliable in the long term. Lucene does a lot more with files aside from fsyncing them – it also renames and deletes them. If we intervene only in the fsync process, we see inconsistent state, where, for example, we try to fsync files that no longer exist, or never fsync files that were renamed in the meantime.

Here I took a simpler approach, where we simply add the option to disable transaction/rollback support in LuceneSail. Then, during transaction commit, we call flush() on the index (not a full commit), and the real commit is done on a timer. Changes are made visible to readers after the commit only.

This sounds like a pretty big compromise, but for my use case this is totally fine. If we have slight inconsistencies in the text index because of a missed rollback or two, that's fine, nothing is going to collapse. Similarly, having a 10 second delay between insert and the text being visible in queries is also totally acceptable.

I think that with enough engineering effort you could make this better (e.g., have instant visibility of the changes in readers), but quite honestly... this is good enough.

I ran some end-to-end tests and verified that this indeed does result in a ~10x throughput speedup in insert transactions.


PR Author Checklist (see the contributor guidelines for more details):

  • my pull request is self-contained
  • I've added tests for the changes I made
  • I've applied code formatting (you can use mvn process-resources to format from the command line)
  • I've squashed my commits where necessary
  • every commit message starts with the issue number (GH-xxxx) followed by a meaningful description of the change

Ostrzyciel added a commit to Ostrzyciel/nanopub-query that referenced this pull request Nov 22, 2025
Ostrzyciel added a commit to Ostrzyciel/nanopub-query that referenced this pull request Nov 22, 2025
@Ostrzyciel Ostrzyciel force-pushed the GH-5291-lucene-disable-transactions branch from 2197d07 to bda6d00 Compare November 22, 2025 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LuceneSail fsyncs after each TX, leading to very poor performance

1 participant