Skip to content

Adopt doc value skippers in compute engine #128442

@martijnvg

Description

@martijnvg

We're started to adopt doc value skippers for the index sort fields in logsdb and tsdb. We will also look into enabling doc value skippers for other fields and for other index modes.

In the compute engine we will also need to make use of doc value skippers. Examples of optimizations:

  • When filters and or sorts get pushed down to LuceneTopNSourceOperator (and others?), then sorting and filtering needs to make use of the doc value skippers. This should happen automatically as Lucene queries and field comparators get support for doc value skippers.

  • Avoid loading values when performing min/max aggregation on fields if skipper indicates that least compatative value is lower or higher

  • Use doc value skippers's count statistic to perform aggregate count function if possible.

  • Keep track of sum in doc values skippers and use doc value skipper's sum to perform sum and avg aggregation functions if possible.

  • Reading fields (e.g., TSID or dimension keyword fields) can be performed using the doc values skipper, without accessing the actual stored values. If any level has a constant value (minValue == maxValue), we can fill positions with that constant. If the entire range is covered by a constant level, we can return a constant block immediately.

  • If the timestamp field is used only in BUCKET, we should pass the bucket range when reading the timestamp and perform the evaluation there. If the min/max timestamp falls within a single bucket, we do not need to read any values or execute evaluation for the bucket; instead, we can use the value from the bucket directly.

Even for fields that don't align it makes sense to also enable doc value skippers. We can apply doc value skippers differently. For example for fields that don't align with index sorting, only store a segment level skipper entry, or store more or less skipper entries per skipper level.

The compute engine would need a feedback mechanism between ValuesReader and the aggregate functions (used in hash aggregation operator). So that value reading can be informed what minimum competitive values are and skip skipper blocks that don't match with that.

This effort depends on #127263

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions