Skip to content

Conversation

@hmottestad
Copy link
Contributor

@hmottestad hmottestad commented Oct 3, 2025

GitHub issue resolved: #5291

See #5446

@hmottestad
Copy link
Contributor Author

@Ostrzyciel I branched off from your branch and added some more resilience. I see now that in the org.eclipse.rdf4j.sail.lucene.impl.LuceneDelayedFsyncTest there are several times when files are attempted to be synced even though the files don't exist anymore. Do you know what's going on?

@Ostrzyciel
Copy link
Contributor

@Ostrzyciel I branched off from your branch and added some more resilience. I see now that in the org.eclipse.rdf4j.sail.lucene.impl.LuceneDelayedFsyncTest there are several times when files are attempted to be synced even though the files don't exist anymore. Do you know what's going on?

@hmottestad thank you for the improvements!

No idea about these deleted files... I haven't had issues with it in the previous version of the code. But I see that this only happens when we are closing the index. My working theory is that on index close, Lucene cleans up temporary/outdated files that are no longer needed. But I don't understand why we started observing this only after your changes.

I can see that IndexWriter.close() calls commit itself as well, but I don't see how this could interfere.

I dug deeper into the code of FSDirectory and aside from fsyncs, it also processes file deletes on sync()... It also sometimes renames files. Maybe it's the latter case somehow? Something like:

  • Lucene requests sync of file X
  • We schedule this sync for later
  • Lucene moves file X to Y
  • We allow this (there is no code to override this behavior)
  • We run scheduled sync of X
  • We get an error, X is missing
  • Y never gets fsynced (?) (not sure about this one)

sigh

I'm not sure anymore if this is the right approach.

@hmottestad
Copy link
Contributor Author

I've changed it so that exceptions are only swallowed when the doSync is run from the scheduler. So now when it's run during close() it actually throws an exception.

I don't know what going on either. I need to understand more of what the sync method in lucene does to know if it's fine to ignore the missing files or not.

@hmottestad hmottestad changed the base branch from main to develop October 4, 2025 18:12
@hmottestad
Copy link
Contributor Author

Some of the files that are missing are names pending* and those are temporary files created to write data to, that are renamed after they are done. So not fsyncing those could end up meaning that we never fsync the renamed file either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LuceneSail fsyncs after each TX, leading to very poor performance

2 participants