-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replica recovery fails after using Solr Encryption Plugin in multi-sharded Solr collection #114
Comments
Thanks for this issue. I will try to write a test to reproduce and then fix. |
Hey, I came across this issue as well. When leader replicas try to read any of the encrypted index files to the buffer a
Follower replica logs show that the download has failed. My investigation led to believe that the root cause is the ReplicationHandler trying to read files from the EncryptionDirectory using the "full" length of the file (including the encryption header, footer etc.), while the DecryptingIndexInput actually expect to read up to the "logical" length of the file, resulting in read beyond EOF exception. While digging into it, I noticed ReplicationHandler uses the EncryptionDirectory super class
The above patch seems to solve the EOF error, replicas that were stuck on recovery gone active, and creation of new replicas is also successful. I was just following a hunch, and I do not deeply understand the issue and the appropriate fix for it. BTW working on a test to reproduce the issue. Many thanks! |
Nice investigation @danielsason112.,this seems to be a good lead! It reminds me the file length complexity I had to solve when initially developing the EncryptionDirectory in Lucene, to be compatible with the compound file format and other Lucene file length checks. Thanks |
I am using the Solr encryption plugin for data and index encryption. It is
working fine for single-tenant systems. On a distributed system with two or
more tenants, the follower replica fails to start replication when a
collection has two or more replicas in a shard, Replica recovery fails,
and it continuously retries and fails.
I have tested this behaviour in a multi-sharded Solr collection with two
replicas
per shard.
On the Solr log getting this error - org.apache.solr.update.processor.D
istributedUpdateProcessor Ignoring commit while not ACTIVE - state:
BUFFERING replay: false
The replica type is: NRT and using encryption factory EncryptionDirectoryFactory extends MMapDirectoryFactory
The text was updated successfully, but these errors were encountered: