fix IllegalArgumentException("inconsistent range") from ConcurrentSkipListSet #4551
+34
−2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
When the content of LedgerMetadata is as follows:
Versioned(value=LedgerMetadata{formatVersion=3, ensembleSize=3, writeQuorumSize=3, ackQuorumSize=2, state=CLOSED, length=42, lastEntryId=1, digestType=CRC32C, password=base64:, ensembles={0=[10.167.101.44:3181, 10.145.144.76:3181, 10.145.136.51:3181], 1=[10.170.112.33:3181, 10.170.140.51:3181, 10.170.92.28:3181], 2=[10.171.7.2:3181, 10.172.180.82:3181, 10.172.149.89:3181]}, customMetadata={component=base64:bWFuYWdlZC1sZWRnZXI=, pulsar/managed-ledger=base64:cHVibGljL2RlZmF1bHQvcGVyc2lzdGVudC9kaWNoYXRfcHJvZF9ldmVudF9sb2ctcGFydGl0aW9uLTI=, pulsar/cursor=base64:Y2dfZGljaGF0X3Byb2RfZXZlbnRfbG9n, application=base64:cHVsc2Fy}}, version=13)
Based on the above description, ReplicationWorker will replicate the first and last fragment. The tryReadingFaultyEntries function will be called before replicating.
boolean tryReadingFaultyEntries(LedgerHandle lh, LedgerFragment ledgerFragment)
After the first fragment replica fails, the fragment will be skipped. At this time, the value of unableToReadEntriesForReplication is <ledgerId=0, entryIdsUnableToRead=<0>>.
When replicating the last fragment, tryReadingFaultyEntries will throw an IllegalArgumentException("inconsistent range"). Which in turn causes the ReplicationWorker process to exit.
The log is as follows:
2025-02-05 20:16:16,041 [ DEBUG ] ReplicationWorker - Founds fragments [Fragment(LedgerID: 449738, FirstEntryID: 0[0], LastKnownEntryID: 0[0], Host: [10.145.136.51:3181, 10.167.101.44:3181, 10.145.144.76:3181], Closed: true), Fragment(LedgerID: 449738, FirstEntryID: 1[1], LastKnownEntryID: 1[1], Host: [10.170.92.28:3181, 10.170.112.33:3181, 10.170.140.51:3181], Closed: true), Fragment(LedgerID: 449738, FirstEntryID: 2[-1], LastKnownEntryID: 1[-1], Host: [10.171.7.2:3181, 10.172.149.89:3181, 10.172.180.82:3181], Closed: true)] for replication from ledger: 449738
From the log, we can see that the FirstEntryID of the last fragment is greater than the LastKnownEntryID, which will cause the subSet function of the ConcurrentSkipListSet class to throw an IllegalArgumentException.
The exception log is as follows:
2025-02-05 16:51:23,786 [ ERROR ] BookieThread - Uncaught exception in thread ReplicationWorker java.lang.IllegalArgumentException: inconsistent range at java.util.concurrent.ConcurrentSkipListMap$SubMap.<init>(ConcurrentSkipListMap.java:2404) ~[?:?] at java.util.concurrent.ConcurrentSkipListMap.subMap(ConcurrentSkipListMap.java:1884) ~[?:?] at java.util.concurrent.ConcurrentSkipListSet.subSet(ConcurrentSkipListSet.java:416) ~[?:?] at org.apache.bookkeeper.replication.ReplicationWorker.tryReadingFaultyEntries(ReplicationWorker.java:316)
Changes
When the fragment's FirstEntryID is greater than LastKnownEntryID, tryReadingFaultyEntries directly returns true.