Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Lucene index scrubbing of missing entries #3009

Merged
merged 20 commits into from
Feb 12, 2025

Conversation

jjezra
Copy link
Contributor

@jjezra jjezra commented Dec 18, 2024

To validate Lucene index validity, support "Report Only" scrubbing for:
Dangling Lucene index entries: Iterate "all entries" (similar toLuceneScanAllEntriesTest), validate that all pointers lead to existing records.
Missing Lucene index entries: iterate all records, validate that their primary keys are represented in the “primary key to Lucene segment” map, and that the Lucene segment exists

This resolves #3008

@foundationdb-ci
Copy link
Contributor

Result of fdb-record-layer-pr on Linux CentOS 7

  • Commit ID: 0769dbc
  • Duration 0:37:04
  • Result: ❌ FAILED
  • Error: Error while executing command: ./gradlew --no-daemon --console=plain -b ./build.gradle build destructiveTest -PcoreNotStrict -PreleaseBuild=false -PpublishBuild=false -PspotbugsEnableHtmlReport. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of fdb-record-layer-pr on Linux CentOS 7

  • Commit ID: 00d9136
  • Duration 0:46:09
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of fdb-record-layer-pr on Linux CentOS 7

  • Commit ID: b754ff7
  • Duration 0:37:57
  • Result: ❌ FAILED
  • Error: Error while executing command: ./gradlew --no-daemon --console=plain -b ./build.gradle build destructiveTest -PcoreNotStrict -PreleaseBuild=false -PpublishBuild=false -PspotbugsEnableHtmlReport. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of fdb-record-layer-pr on Linux CentOS 7

  • Commit ID: 8e23ece
  • Duration 0:36:28
  • Result: ❌ FAILED
  • Error: Error while executing command: ./gradlew --no-daemon --console=plain -b ./build.gradle build destructiveTest -PcoreNotStrict -PreleaseBuild=false -PpublishBuild=false -PspotbugsEnableHtmlReport. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of fdb-record-layer-pr on Linux CentOS 7

  • Commit ID: a31ae21
  • Duration 0:36:29
  • Result: ❌ FAILED
  • Error: Error while executing command: ./gradlew --no-daemon --console=plain -b ./build.gradle build destructiveTest -PcoreNotStrict -PreleaseBuild=false -PpublishBuild=false -PspotbugsEnableHtmlReport. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of fdb-record-layer-pr on Linux CentOS 7

  • Commit ID: 1d6bab9
  • Duration 0:49:10
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of fdb-record-layer-pr on Linux CentOS 7

  • Commit ID: 0f8f0d8
  • Duration 0:46:40
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of fdb-record-layer-pr on Linux CentOS 7

  • Commit ID: df17ad5
  • Duration 0:52:29
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of fdb-record-layer-pr on Linux CentOS 7

  • Commit ID: 1dade63
  • Duration 0:53:11
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of fdb-record-layer-pr on Linux CentOS 7

  • Commit ID: e66cd65
  • Duration 0:54:02
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@jjezra jjezra requested a review from ScottDugas January 13, 2025 22:17
@jjezra jjezra marked this pull request as ready for review January 13, 2025 22:17
@foundationdb-ci
Copy link
Contributor

Result of fdb-record-layer-pr on Linux CentOS 7

  • Commit ID: 0e049d9
  • Duration 0:52:39
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of fdb-record-layer-pr on Linux CentOS 7

  • Commit ID: 4b17834
  • Duration 0:54:24
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@jjezra jjezra requested a review from ScottDugas January 18, 2025 07:39

try (final FDBRecordContext context = openContext()) {
// Write some documents
dataModel.saveRecords(15, 1007, context, 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You only save to group 1. This helps to ensure that you have multiple partitions, but means you aren't testing whether it scrubs all the groups.

Copy link
Contributor Author

@jjezra jjezra Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -32,13 +37,25 @@
* the test execution.
*/
public class MockedLuceneIndexMaintainer extends LuceneIndexMaintainer {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach is very lucene specific. It's sufficient for the test in question, but if you replaced the index maintainer entirely by a NoOp maintainer, the same process could be used for other scrubbing tests, without as much work.
Not something needed here, but something to keep in mind as you look into additional scrubbing (other index types, or dangling)

*/
public class LuceneIndexScrubbingToolsMissing implements IndexScrubbingTools<FDBStoredRecord<Message>> {
public class LuceneIndexScrubbingToolsMissing extends ValueIndexScrubbingToolsMissing {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree with your comment that extending ValueIndexScrubbingToolsMissing feels wrong.
Perhaps a BaseScrubbingToolsMissing would make sense, although I think the only method that actually should be shared is getCursor, so probably a utility class would make more sense.

Leaving as you have it until we have a third Missing implementation also seems reasonable, as it might also align with better abstracting synthetic records in general for use across scrubbing, indexing and IndexMaintenance.

public CompletableFuture<Pair<MissingIndexReason, Tuple>> detectMissingIndexKeys(FDBStoredRecord<Message> rec) {
// return the first missing (if any).
@SuppressWarnings("PMD.CloseResource")
private CompletableFuture<Pair<MissingIndexReason, Tuple>> detectMissingIndexKeys(final FDBRecordStore store, FDBStoredRecord<Message> rec) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't notice until I went to look at the similarity between this and ValueIndexScrubbingToolsMissing, but you don't check the index filter before saving. I think that means you will have false-negatives if any index entries are filtered.
It's probably worth adding a boolean field to MyParentRecord, and adding an additional parameter to saveRecords to save filtered out records, and change the index definition to filter out anything with that field set to true (maybe like isHidden)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@foundationdb-ci
Copy link
Contributor

Result of fdb-record-layer-pr on Linux CentOS 7

  • Commit ID: 0546e22
  • Duration 0:57:30
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of fdb-record-layer-pr on Linux CentOS 7

  • Commit ID: b201cf4
  • Duration 0:57:15
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

Comment on lines 193 to 195
dataModel.saveRecords(3, 10, context, 1);
dataModel.saveRecords(2, 20, context, 3);
dataModel.saveRecords(5, 20, context, 4);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dose this guarantee that scrubbing scrubs across all partitions?

Copy link
Contributor Author

@jjezra jjezra Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not. I'll add an explicit merge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update:
Added explicit merges.
Added more records.
Reduced partition's high watermark.

@foundationdb-ci
Copy link
Contributor

Result of fdb-record-layer-pr on Linux CentOS 7

  • Commit ID: 823d124
  • Duration 0:47:24
  • Result: ❌ FAILED
  • Error: Error while executing command: ./gradlew --no-daemon --console=plain -b ./build.gradle build destructiveTest -PcoreNotStrict -PreleaseBuild=false -PpublishBuild=false -PspotbugsEnableHtmlReport. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of fdb-record-layer-pr on Linux CentOS 7

  • Commit ID: c1a9d18
  • Duration 0:36:55
  • Result: ❌ FAILED
  • Error: Error while executing command: ./gradlew --no-daemon --console=plain -b ./build.gradle build destructiveTest -PcoreNotStrict -PreleaseBuild=false -PpublishBuild=false -PspotbugsEnableHtmlReport. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@jjezra
Copy link
Contributor Author

jjezra commented Jan 28, 2025

Resolved conflicts

@jjezra jjezra requested a review from ScottDugas January 28, 2025 12:51
@jjezra jjezra requested a review from ScottDugas February 10, 2025 21:28
ScottDugas
ScottDugas previously approved these changes Feb 11, 2025
@jjezra jjezra merged commit 5254499 into FoundationDB:main Feb 12, 2025
3 checks passed
@jjezra jjezra deleted the lucene_scrubbing branch February 12, 2025 06:33
@ScottDugas ScottDugas added the enhancement New feature or request label Feb 14, 2025
@ScottDugas ScottDugas changed the title Resolve #3008: Support Lucene index scrubbing Resolve #3008: Support Lucene index scrubbing of missing entries Feb 19, 2025
@ScottDugas ScottDugas changed the title Resolve #3008: Support Lucene index scrubbing of missing entries Support Lucene index scrubbing of missing entries Feb 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Lucene index scrubbing of missing entries
3 participants