Skip to content

Conversation

@xiangfu0
Copy link
Contributor

  • Infer dictionary creation whenever indexes that require a dictionary (inverted, FST, range, etc.) are configured, and plumb raw-encoding awareness through forward index creator/reader factories.

  • Extend forward index handler logic to handle raw-forward columns with index add/remove flows, compression updates, and dictionary toggles more robustly.

  • Add integration and unit tests: new raw forward + inverted index integration test, expanded reload coverage, and updated forward index config/type utilities and defaults.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enables dictionary inference for indexes that require dictionaries (inverted, FST, range) and enhances raw forward index handling throughout the codebase. It introduces automatic dictionary creation when required by certain index types, plumbs raw-encoding awareness through forward index creator/reader factories, and extends the forward index handler to support raw-forward columns in various operations.

Key changes:

  • Automatic dictionary inference when indexes requiring dictionaries are configured
  • Raw-encoding flag in ForwardIndexConfig to distinguish raw vs. dictionary-encoded forward indexes
  • Enhanced ForwardIndexHandler to support raw forward indexes with dictionary add/remove operations
  • New integration test verifying raw forward index + inverted index functionality

Reviewed changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
githubComplexTypeEvents_offline_table_config.json Adds raw-encoded forward index with inverted index configuration example
ForwardIndexCreator.java Refactors raw value handling into separate addRaw method
IndexType.java Adds requiresDictionary method to indicate dictionary requirement
ForwardIndexUtils.java New utility to identify raw forward index columns from table config
ForwardIndexConfig.java Adds rawEncoding flag and getter/builder methods
FieldIndexConfigsUtil.java Implements automatic dictionary enablement for required indexes
DictionaryIndexConfig.java Adds utility methods to check if dictionary is required by any index
ForwardIndexTypeTest.java Updates tests to include rawEncoding flag in expected configs
ColumnMinMaxValueGenerator.java Passes ForwardIndexConfig when creating readers
InvertedIndexAndDictionaryBasedForwardIndexCreator.java Respects rawEncoding flag when determining dictionary usage
ForwardIndexHandler.java Adds ENABLE_DICTIONARY_FOR_RAW_FORWARD_INDEX operation and handling logic
InvertedIndexType.java Implements requiresDictionary to return true
IFSTIndexType.java Implements requiresDictionary to return true
FstIndexType.java Implements requiresDictionary to return true
ForwardIndexType.java Updates getFileExtension to return multiple possible extensions
ForwardIndexReaderFactory.java Uses rawEncoding flag to determine reader type
ForwardIndexCreatorFactory.java Uses rawEncoding flag to determine creator type
SingleValueVarByteRawIndexCreator.java Overrides add method to delegate to addRaw
SingleValueFixedByteRawIndexCreator.java Overrides add method to delegate to addRaw
MultiValueVarByteRawIndexCreator.java Overrides add method to delegate to addRaw
MultiValueFixedByteRawIndexCreator.java Overrides add method to delegate to addRaw
CLPForwardIndexCreatorV2.java Overrides add method for CLP encoding
CLPForwardIndexCreatorV1.java Overrides add method for CLP encoding
SegmentIndexCreationDriverImpl.java Fixes inverted logic bug (isDisabled → isEnabled)
SegmentColumnarIndexCreator.java Adds warning when dictionary required but explicitly disabled
SameValueForwardIndexCreator.java Overrides add method to delegate to addRaw
RawForwardIndexInvertedIndexTest.java New integration test for raw forward index with inverted index
ForwardIndexHandlerReloadQueriesTest.java Updates test expectations and adds raw forward index test coverage
DataFetcher.java Adds _useDictionary field for proper dictionary usage check

@xiangfu0 xiangfu0 force-pushed the dictionary-auto-creation-when-corresponding-indexes-enabled branch from 880c72e to e042d90 Compare November 25, 2025 03:41
@xiangfu0 xiangfu0 requested a review from Copilot November 25, 2025 03:41
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 29 out of 29 changed files in this pull request and generated no new comments.

@xiangfu0 xiangfu0 force-pushed the dictionary-auto-creation-when-corresponding-indexes-enabled branch 2 times, most recently from 74727cd to d202d42 Compare November 25, 2025 12:36
@codecov-commenter
Copy link

codecov-commenter commented Nov 25, 2025

Codecov Report

❌ Patch coverage is 48.00000% with 117 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.25%. Comparing base (1abdbf5) to head (677c60f).

Files with missing lines Patch % Lines
...segment/spi/index/creator/ForwardIndexCreator.java 0.00% 24 Missing ⚠️
...pinot/segment/spi/index/DictionaryIndexConfig.java 0.00% 22 Missing ⚠️
...che/pinot/segment/spi/index/ForwardIndexUtils.java 0.00% 18 Missing ⚠️
...he/pinot/segment/spi/index/ForwardIndexConfig.java 21.42% 11 Missing ⚠️
...pinot/segment/spi/index/FieldIndexConfigsUtil.java 42.85% 6 Missing and 2 partials ⚠️
...ment/creator/impl/SegmentColumnarIndexCreator.java 41.66% 6 Missing and 1 partial ⚠️
...java/org/apache/pinot/core/common/DataFetcher.java 66.66% 4 Missing and 2 partials ⚠️
...ocal/segment/index/loader/ForwardIndexHandler.java 90.47% 2 Missing and 4 partials ⚠️
...local/segment/creator/impl/BaseSegmentCreator.java 25.00% 2 Missing and 1 partial ⚠️
...ent/creator/impl/fwd/CLPForwardIndexCreatorV1.java 0.00% 2 Missing ⚠️
... and 7 more
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17269      +/-   ##
============================================
- Coverage     63.27%   63.25%   -0.02%     
- Complexity     1474     1490      +16     
============================================
  Files          3139     3140       +1     
  Lines        187189   187346     +157     
  Branches      28652    28687      +35     
============================================
+ Hits         118437   118510      +73     
- Misses        59586    59660      +74     
- Partials       9166     9176      +10     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.21% <48.00%> (-0.02%) ⬇️
java-21 63.23% <48.00%> (+<0.01%) ⬆️
temurin 63.25% <48.00%> (-0.02%) ⬇️
unittests 63.25% <48.00%> (-0.02%) ⬇️
unittests1 55.58% <27.11%> (-0.11%) ⬇️
unittests2 33.87% <36.88%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@xiangfu0 xiangfu0 force-pushed the dictionary-auto-creation-when-corresponding-indexes-enabled branch 2 times, most recently from c2dfdcc to 4b4bcd7 Compare November 27, 2025 12:26
@xiangfu0 xiangfu0 force-pushed the dictionary-auto-creation-when-corresponding-indexes-enabled branch 3 times, most recently from 5fd7f53 to 9e63630 Compare December 10, 2025 04:48
@xiangfu0 xiangfu0 force-pushed the dictionary-auto-creation-when-corresponding-indexes-enabled branch from 9e63630 to 677c60f Compare December 10, 2025 11:15
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 37 out of 37 changed files in this pull request and generated 7 comments.

Comment on lines +345 to 352
@Test(dataProvider = "bothV1AndV3", expectedExceptions = RuntimeException.class)
public void testEnableTextIndexOnNewColumnRaw(SegmentVersion segmentVersion)
throws Exception {
buildSegment(segmentVersion);
_fieldConfigMap.put(NEWLY_ADDED_STRING_COL_RAW,
new FieldConfig(NEWLY_ADDED_STRING_COL_RAW, FieldConfig.EncodingType.RAW, List.of(FieldConfig.IndexType.TEXT),
null, null));
_fieldConfigMap.put(NEWLY_ADDED_STRING_MV_COL_RAW,
new FieldConfig(NEWLY_ADDED_STRING_MV_COL_RAW, FieldConfig.EncodingType.RAW,
List.of(FieldConfig.IndexType.TEXT), null, null));
checkTextIndexCreation(NEWLY_ADDED_STRING_COL_RAW, 1, 1, _newColumnsSchemaWithText, true, true, true, 4);
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test expects RuntimeException but doesn't verify the exception message or root cause. The test should either validate the specific exception type/message or document why any RuntimeException is acceptable. Additionally, removed test coverage for NEWLY_ADDED_STRING_MV_COL_RAW should be restored or its removal justified.

Copilot uses AI. Check for mistakes.
Comment on lines +587 to +597
try {
checkForwardIndexCreation(COLUMN10_NAME, 3960, 12, _schema, false, false, false, 0, ChunkCompressionType.LZ4,
true, 0, DataType.INT, 100000);
validateIndex(StandardIndexes.range(), COLUMN10_NAME, 3960, 12, false, false, false, 0, true, 0,
ChunkCompressionType.LZ4, false, DataType.INT, 100000);
long newRangeIndexSize = new SegmentMetadataImpl(INDEX_DIR).getColumnMetadataFor(COLUMN10_NAME)
.getIndexSizeFor(StandardIndexes.range());
assertNotEquals(oldRangeIndexSize, newRangeIndexSize);
} catch (RuntimeException e) {
// Forward-index-disabled columns cannot rebuild range index without an existing forward index; accept failure.
}
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty catch block swallows RuntimeException without logging or assertion. This makes test failures silent and hard to debug. Consider either asserting specific exception conditions, logging the caught exception, or restructuring the test to explicitly handle both success and failure paths with clear assertions.

Copilot uses AI. Check for mistakes.
Comment on lines +249 to 265
public List<String> getFileExtension(ColumnMetadata columnMetadata) {
if (columnMetadata.isSingleValue()) {
if (!columnMetadata.hasDictionary()) {
return V1Constants.Indexes.RAW_SV_FORWARD_INDEX_FILE_EXTENSION;
return List.of(V1Constants.Indexes.RAW_SV_FORWARD_INDEX_FILE_EXTENSION);
} else if (columnMetadata.isSorted()) {
return V1Constants.Indexes.SORTED_SV_FORWARD_INDEX_FILE_EXTENSION;
return List.of(V1Constants.Indexes.RAW_SV_FORWARD_INDEX_FILE_EXTENSION,
V1Constants.Indexes.SORTED_SV_FORWARD_INDEX_FILE_EXTENSION);
} else {
return V1Constants.Indexes.UNSORTED_SV_FORWARD_INDEX_FILE_EXTENSION;
return List.of(V1Constants.Indexes.RAW_SV_FORWARD_INDEX_FILE_EXTENSION,
V1Constants.Indexes.UNSORTED_SV_FORWARD_INDEX_FILE_EXTENSION);
}
} else if (!columnMetadata.hasDictionary()) {
return V1Constants.Indexes.RAW_MV_FORWARD_INDEX_FILE_EXTENSION;
return List.of(V1Constants.Indexes.RAW_MV_FORWARD_INDEX_FILE_EXTENSION);
} else {
return V1Constants.Indexes.UNSORTED_MV_FORWARD_INDEX_FILE_EXTENSION;
return List.of(V1Constants.Indexes.RAW_MV_FORWARD_INDEX_FILE_EXTENSION,
V1Constants.Indexes.UNSORTED_MV_FORWARD_INDEX_FILE_EXTENSION);
}
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic for returning file extensions now includes RAW_SV_FORWARD_INDEX_FILE_EXTENSION in all dictionary-enabled single-value cases (lines 254-258), even when sorted or unsorted. This seems inconsistent with the multi-value logic (lines 260-264). Verify that RAW extensions should indeed be included for dictionary-encoded columns, as this appears to contradict the raw-encoding concept.

Copilot uses AI. Check for mistakes.
new ColumnIndexCreators(columnName, fieldSpec, null, List.of(),
new NullValueVectorCreator(outDir, columnName)));
} else {
LOGGER.info("Column: {} is not nullable", columnName);
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The added null value vector handling logs both nullable and non-nullable columns at INFO level. This could create excessive logging for tables with many columns. Consider logging only nullable columns, or reducing non-nullable logging to DEBUG level.

Suggested change
LOGGER.info("Column: {} is not nullable", columnName);
LOGGER.debug("Column: {} is not nullable", columnName);

Copilot uses AI. Check for mistakes.
Comment on lines 175 to 183
} else {
disableDictionaryAndCreateRawForwardIndex(column, segmentWriter);
FieldIndexConfigs fieldIndexConfig = _fieldIndexConfigs.get(column);
ForwardIndexConfig forwardIndexConfig = fieldIndexConfig.getConfig(StandardIndexes.forward());
if (forwardIndexConfig.isRawEncoding()) {
disableDictionary(column, segmentWriter, "Disable dictionary when only raw forward index exists");
} else {
disableDictionaryAndCreateRawForwardIndex(column, segmentWriter);
}
}
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nested if-else chain starting at line 169 is complex and could benefit from early returns or extraction into separate methods. The logic handling DISABLE_DICTIONARY has three distinct paths (no forward index, raw forward index, and dictionary forward index) that would be clearer with guard clauses.

Suggested change
} else {
disableDictionaryAndCreateRawForwardIndex(column, segmentWriter);
FieldIndexConfigs fieldIndexConfig = _fieldIndexConfigs.get(column);
ForwardIndexConfig forwardIndexConfig = fieldIndexConfig.getConfig(StandardIndexes.forward());
if (forwardIndexConfig.isRawEncoding()) {
disableDictionary(column, segmentWriter, "Disable dictionary when only raw forward index exists");
} else {
disableDictionaryAndCreateRawForwardIndex(column, segmentWriter);
}
}
break;
}
FieldIndexConfigs fieldIndexConfig = _fieldIndexConfigs.get(column);
ForwardIndexConfig forwardIndexConfig = fieldIndexConfig.getConfig(StandardIndexes.forward());
if (forwardIndexConfig.isRawEncoding()) {
disableDictionary(column, segmentWriter, "Disable dictionary when only raw forward index exists");
break;
}
disableDictionaryAndCreateRawForwardIndex(column, segmentWriter);

Copilot uses AI. Check for mistakes.
Comment on lines +138 to +139
assertEquals(response.get("numEntriesScannedInFilter").asLong(), 0L,
"Inverted index should avoid scanning filters even for raw forward index");
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertion verifies that numEntriesScannedInFilter is 0, expecting the inverted index to eliminate filter scanning. However, the test doesn't verify that the inverted index was actually used. Consider adding an assertion to check query plan or execution statistics that confirm inverted index usage.

Copilot uses AI. Check for mistakes.
Comment on lines +664 to +677
@Test
public void testRawForwardColumnIndexAddAndRemove()
throws Exception {
String filterValue = escapeSingleQuotes(getMostFrequentColumn5Value());
String filterQuery = String.format("SELECT COUNT(*) FROM %s WHERE column5 = '%s'", RAW_TABLE_NAME, filterValue);

BrokerResponseNative baselineResponse = getBrokerResponse(filterQuery);
assertTrue(baselineResponse.getExceptions() == null || baselineResponse.getExceptions().size() == 0);
ResultTable baselineResultTable = baselineResponse.getResultTable();
assertNotNull(baselineResultTable);
List<Object[]> baselineRows = baselineResultTable.getRows();
long baselineEntriesScanned = baselineResponse.getNumEntriesScannedInFilter();
assertTrue(baselineEntriesScanned > 0);
}
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test method testRawForwardColumnIndexAddAndRemove() establishes a baseline but doesn't perform any index add/remove operations or validate behavior changes. The test appears incomplete and should either be removed or expanded to test the full add/remove cycle it claims to test.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants