Skip to content

ES|QL perf field loading from _source: stop loading the entire _source map #137372

@quackaplop

Description

@quackaplop

Today when we load data from _source (e.g. text fields, or unmapped fields in the future), we go through `StoredValueFetcher::extractValue'.

That relies on Source:extractValue, which parses the entire json and shoves the result into a map via XContentHelper.convertToMap.

The stack looks roughly like this

extractValue:187, XContentMapValues (org.elasticsearch.common.xcontent.support)
extractValue:71, Source (org.elasticsearch.search.lookup)
fetchValues:60, SourceValueFetcher (org.elasticsearch.index.mapper)
read:53, BlockSourceReader (org.elasticsearch.index.mapper)
read:216, ValuesFromSingleReader$RowStrideReaderWork (org.elasticsearch.compute.lucene.read)

All of this means that we materialize the entire map for each _source, even if we need to read a single field, which results in many unnecessary allocations and a lot of GC garbage.

Given that this happens in the critical path in many common cases - and we are about to make this worse with unmapped fields - we should look at refactoring this code - feels like using JsonParse streaming APIs (seeking to the attribute and reading just that) should be easy enough to do

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Analytics/Compute EngineAnalytics in ES|QL:Analytics/ES|QLAKA ESQL:PerformanceAll issues related to Elasticsearch performance including regressions and investigationsTeam:AnalyticsMeta label for analytical engine team (ESQL/Aggs/Geo)Team:PerformanceMeta label for performance team

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions