Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use FallbackSyntheticSourceBlockLoader for text fields #126237

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

lkts
Copy link
Contributor

@lkts lkts commented Apr 3, 2025

This also covers annotated_text because AnnotatedTextFieldType inherits TextFieldType.

@elasticsearchmachine elasticsearchmachine added v9.1.0 needs:triage Requires assignment of a team area label labels Apr 3, 2025
// See TextFieldMapper.SyntheticSourceHelper#usingSyntheticSourceDelegate
// and TextFieldMapper#canUseSyntheticSourceDelegateForLoading().
boolean usingSyntheticSourceDelegate = docValues || store;
boolean canUseSyntheticSourceDelegateForLoading = usingSyntheticSourceDelegate && ignoreAbove == null && (index || store);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand why do we require multi field to be indexed here (the actual impl is in TextFieldMapper).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is strange (only doc values / stored should matter). What failure occurs when we don't check for index here?

return KeywordFieldBlockLoaderTests.expectedValue(keywordMultiFieldMapping, value, params, testContext);
}

// Even if multi-field is not eligible for loading it can still be used to produce synthetic source
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole thing is due to the fact that not all syntheticSourceDelegate are eligible to be used by block loader.

// KeywordFieldBlockLoaderTest
// It is here since KeywordFieldBlockLoaderTest does not really need it
if (params.syntheticSource() && testContext.forceFallbackSyntheticSource() == false && usingSyntheticSourceDelegate) {
var nullValue = (String) keywordMultiFieldMapping.get("null_value");
Copy link
Contributor Author

@lkts lkts Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If syntheticSourceDelegate is used than null_value of the multi field will be in the resulting synthetic source. This is good because we'll preserve such the values during reindex. On the other hand this is inconsistent - synthetic source for text behaves differently depending on the presence of multi field and mapping parameters of multi field (text ignores nulls).

if (params.syntheticSource() && testContext.forceFallbackSyntheticSource() == false && usingSyntheticSourceDelegate) {
var nullValue = (String) keywordMultiFieldMapping.get("null_value");

// Due to how TextFieldMapper#blockReaderDisiLookup works this is complicated.
Copy link
Contributor Author

@lkts lkts Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another inconsistency - block loader returns a null_value from multi field instead of null only if text field itself is not indexed. If it is, it returns nothing.

@lkts lkts added >enhancement auto-backport Automatically create backport pull requests when merged :StorageEngine/Mapping The storage related side of mappings v8.19.0 labels Apr 3, 2025
@elasticsearchmachine elasticsearchmachine added Team:StorageEngine and removed needs:triage Requires assignment of a team area label labels Apr 3, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@elasticsearchmachine
Copy link
Collaborator

Hi @lkts, I've created a changelog YAML for you.

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks Sasha!

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

// See TextFieldMapper.SyntheticSourceHelper#usingSyntheticSourceDelegate
// and TextFieldMapper#canUseSyntheticSourceDelegateForLoading().
boolean usingSyntheticSourceDelegate = docValues || store;
boolean canUseSyntheticSourceDelegateForLoading = usingSyntheticSourceDelegate && ignoreAbove == null && (index || store);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is strange (only doc values / stored should matter). What failure occurs when we don't check for index here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged >enhancement :StorageEngine/Mapping The storage related side of mappings Team:StorageEngine v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants