Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix IllegalArgumentException("inconsistent range") from ConcurrentSkipListSet #4551

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

SongOf
Copy link

@SongOf SongOf commented Feb 7, 2025

Motivation

When the content of LedgerMetadata is as follows:

Versioned(value=LedgerMetadata{formatVersion=3, ensembleSize=3, writeQuorumSize=3, ackQuorumSize=2, state=CLOSED, length=42, lastEntryId=1, digestType=CRC32C, password=base64:, ensembles={0=[10.167.101.44:3181, 10.145.144.76:3181, 10.145.136.51:3181], 1=[10.170.112.33:3181, 10.170.140.51:3181, 10.170.92.28:3181], 2=[10.171.7.2:3181, 10.172.180.82:3181, 10.172.149.89:3181]}, customMetadata={component=base64:bWFuYWdlZC1sZWRnZXI=, pulsar/managed-ledger=base64:cHVibGljL2RlZmF1bHQvcGVyc2lzdGVudC9kaWNoYXRfcHJvZF9ldmVudF9sb2ctcGFydGl0aW9uLTI=, pulsar/cursor=base64:Y2dfZGljaGF0X3Byb2RfZXZlbnRfbG9n, application=base64:cHVsc2Fy}}, version=13)

  • this ledger is closed
  • The firstEntryId of the last fragment of the ledger is 2. But the lastEntryId of the ledger is 1.
  • All bookies of the first fragment of this ledger have been offline. Such as 10.167.101.44:3181,10.145.144.76:3181, 10.145.136.51:3181. Therefore, entry(entryId=0) reading will fail.
  • A bookie of the last fragment has been offline, for example 10.171.7.2:3181.

Based on the above description, ReplicationWorker will replicate the first and last fragment. The tryReadingFaultyEntries function will be called before replicating.
boolean tryReadingFaultyEntries(LedgerHandle lh, LedgerFragment ledgerFragment)

After the first fragment replica fails, the fragment will be skipped. At this time, the value of unableToReadEntriesForReplication is <ledgerId=0, entryIdsUnableToRead=<0>>.

When replicating the last fragment, tryReadingFaultyEntries will throw an IllegalArgumentException("inconsistent range"). Which in turn causes the ReplicationWorker process to exit.

The log is as follows:
2025-02-05 20:16:16,041 [ DEBUG ] ReplicationWorker - Founds fragments [Fragment(LedgerID: 449738, FirstEntryID: 0[0], LastKnownEntryID: 0[0], Host: [10.145.136.51:3181, 10.167.101.44:3181, 10.145.144.76:3181], Closed: true), Fragment(LedgerID: 449738, FirstEntryID: 1[1], LastKnownEntryID: 1[1], Host: [10.170.92.28:3181, 10.170.112.33:3181, 10.170.140.51:3181], Closed: true), Fragment(LedgerID: 449738, FirstEntryID: 2[-1], LastKnownEntryID: 1[-1], Host: [10.171.7.2:3181, 10.172.149.89:3181, 10.172.180.82:3181], Closed: true)] for replication from ledger: 449738

From the log, we can see that the FirstEntryID of the last fragment is greater than the LastKnownEntryID, which will cause the subSet function of the ConcurrentSkipListSet class to throw an IllegalArgumentException.
The exception log is as follows:
2025-02-05 16:51:23,786 [ ERROR ] BookieThread - Uncaught exception in thread ReplicationWorker java.lang.IllegalArgumentException: inconsistent range at java.util.concurrent.ConcurrentSkipListMap$SubMap.<init>(ConcurrentSkipListMap.java:2404) ~[?:?] at java.util.concurrent.ConcurrentSkipListMap.subMap(ConcurrentSkipListMap.java:1884) ~[?:?] at java.util.concurrent.ConcurrentSkipListSet.subSet(ConcurrentSkipListSet.java:416) ~[?:?] at org.apache.bookkeeper.replication.ReplicationWorker.tryReadingFaultyEntries(ReplicationWorker.java:316)

Changes

When the fragment's FirstEntryID is greater than LastKnownEntryID, tryReadingFaultyEntries directly returns true.

@SongOf SongOf changed the title fix IllegalArgumentException("inconsistent range") from ConcurrentSki… fix IllegalArgumentException("inconsistent range") from ConcurrentSkipListSet Feb 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant