-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
Today when we load data from _source (e.g. text fields, or unmapped fields in the future), we go through `StoredValueFetcher::extractValue'.
That relies on Source:extractValue, which parses the entire json and shoves the result into a map via XContentHelper.convertToMap.
The stack looks roughly like this
extractValue:187, XContentMapValues (org.elasticsearch.common.xcontent.support)
extractValue:71, Source (org.elasticsearch.search.lookup)
fetchValues:60, SourceValueFetcher (org.elasticsearch.index.mapper)
read:53, BlockSourceReader (org.elasticsearch.index.mapper)
read:216, ValuesFromSingleReader$RowStrideReaderWork (org.elasticsearch.compute.lucene.read)
All of this means that we materialize the entire map for each _source, even if we need to read a single field, which results in many unnecessary allocations and a lot of GC garbage.
Given that this happens in the critical path in many common cases - and we are about to make this worse with unmapped fields - we should look at refactoring this code - feels like using JsonParse streaming APIs (seeking to the attribute and reading just that) should be easy enough to do