Skip to content

Adding a merge_type parameter to the ingest simulate API #132210

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Aug 11, 2025

Conversation

masseyke
Copy link
Member

@masseyke masseyke commented Jul 30, 2025

If mapping overrides are given to the ingest simulate API (_ingest/_simulate) in the mapping_addition, index_template_substitutions, or component_template_substitutions, they are currently merged in with existing mappings using MapperService.MergeReason.MAPPING_UPDATE when doing mapping validation. This simulates merging the mappings the way that they would be merged in to an existing index, rather than the way that they would be merged into templates. This can cause problems like #131608, where the mapping overrides cannot be merged in.
This PR adds a new merge_type parameter that allows a user to specify exactly how they would like the mappings merged. The options are index or template. By default, they will be merged using the index strategy to maintain backwards compatibility.
Closes #131608

@masseyke masseyke added >enhancement :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP v9.2.0 labels Jul 30, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @masseyke, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Hi @masseyke, I've updated the changelog YAML for you.

@dakrone
Copy link
Member

dakrone commented Jul 30, 2025

Drive-by bikeshed :D — how about merge_type?

@masseyke
Copy link
Member Author

Drive-by bikeshed :D — how about merge_type?

Think it's still clear that it refers only to mappings though? I didn't want to imply anything about the way settings or pipelines were merged.

@masseyke masseyke marked this pull request as ready for review July 31, 2025 21:03
@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label Jul 31, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@masseyke masseyke changed the title Adding a mapping_merge_reason parameter to the ingest simulate API Adding a merge_type parameter to the ingest simulate API Aug 1, 2025
@masseyke masseyke requested a review from Copilot August 1, 2025 15:04
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a new merge_type parameter to the ingest simulate API to control how mapping overrides are merged with existing mappings. The parameter accepts "index" or "template" values, allowing users to choose between merging mappings as they would be merged into an existing index versus how they would be merged into templates.

Key changes:

  • Added merge_type parameter to SimulateBulkRequest constructor and related API endpoints
  • Updated mapping validation logic to use the appropriate MapperService.MergeReason based on the merge type
  • Added comprehensive test coverage for the new functionality

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
SimulateBulkRequest.java Added mappingMergeType field and updated constructor to accept the new parameter
RestSimulateIngestAction.java Added extraction of merge_type parameter from REST request
TransportSimulateBulkAction.java Updated mapping validation to use the appropriate merge reason based on merge type
TransportVersions.java Added new transport version for the feature
Test files Updated all test constructors to include the new parameter
API spec Added documentation for the new merge_type parameter
Integration tests Added test cases to verify the new functionality works correctly

Copy link

@seanzatzdev seanzatzdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@@ -357,6 +357,7 @@ static TransportVersion def(int id) {
public static final TransportVersion COMPONENT_TEMPLATE_TRACKING_INFO = def(9_132_0_00);
public static final TransportVersion TO_CHILD_BLOCK_JOIN_QUERY = def(9_133_0_00);
public static final TransportVersion ML_INFERENCE_AI21_COMPLETION_ADDED = def(9_134_0_00);
public static final TransportVersion SIMULATE_INGEST_MAPPING_MERGE_TYPE = def(9_135_0_00);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If i understand correctly, seems we need to update this version number due to the change in the API request? https://github.com/elastic/elasticsearch/blob/21e8bac36e84a73a8c3aa9740d4b92333edca7ef/docs/internal/Versioning.md#transport-protocol

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we have to increment the transport version any time we change serialization.

@samxbr
Copy link
Contributor

samxbr commented Aug 1, 2025

Drive-by bikeshed :D — how about merge_type?

Think it's still clear that it refers only to mappings though? I didn't want to imply anything about the way settings or pipelines were merged.

A bit late to the party, how about mapping_merge_type? Sounds more explicit on what is being merged.

Copy link
Contributor

@samxbr samxbr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just some minor comments.

Comment on lines +2000 to +2006
"mapping_addition": {
"properties": {
"a.b": {
"type": "keyword"
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Is it worth testing with mapping in index_template_substitutions or component_template_substitutions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it probably is.

@masseyke
Copy link
Member Author

masseyke commented Aug 1, 2025

Drive-by bikeshed :D — how about merge_type?

Think it's still clear that it refers only to mappings though? I didn't want to imply anything about the way settings or pipelines were merged.

A bit late to the party, how about mapping_merge_type? Sounds more explicit on what is being merged.

@dakrone any strong preference? Otherwise I'll probably just go with what I've currently got (merge_type).

@dakrone
Copy link
Member

dakrone commented Aug 4, 2025

any strong preference? Otherwise I'll probably just go with what I've currently got (merge_type).

I don't have a strong preference, I think merge_type will be okay.

@masseyke masseyke merged commit e223df1 into elastic:main Aug 11, 2025
33 checks passed
@masseyke masseyke deleted the ingest-simulate-mapping-merge-reason branch August 11, 2025 17:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >enhancement Team:Data Management Meta label for data/management team v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ingest simulate: mapping_addition doesn't work for subobjects
5 participants