Dynamic splitting by interval for range queries #6458

afhassan · 2024-12-24T18:43:11Z

What this PR does:
Cortex supports only using a static interval to split range queries. This PR adds two new configs that dynamically adjust the split interval to a multiple of the configured split_queries_by_interval depending on the given query.

New configs:
1 - max_shards_per_query
Accepts an int value for the total number of shards for a query. The split interval is increased into a multiple of split_queries_by_interval to ensure that the total number of shards remains below the configured value. This takes into account vertical sharding if it is configured.

Examples:
split_queries_by_interval = 24h
max_shards_per_query = 30

A 30 day range query is split to 30 shards using 24h interval.
A 40 day range query is split to 20 shards using 48h interval.
A 100 day range query is split to 25 shard using 96h interval.

2 - max_fetched_data_duration_per_query
Accepts a duration for the total duration of data fetched by all shards of a query. Certain queries can fetch a long duration of data per each shard when executing. This configuration uses a multiple of split_queries_by_interval to reduce the number of shards so that the total duration of data fetched remains below the configured value.

Examples:
split_queries_by_interval = 24h
max_fetched_data_duration_per_query = 2400h // 100 days

A query up with 30 day range is split into 30 shards using 24h interval
Each shard fetches 1 day of data for a total of 30 days
A query up[10d] with 30 day range is split into 6 shards using 120h interval.
Each shard fetches [5 + 10] days of data for a total of 90 days.
If the query was split into 30 shards using 24h default interval.
Each shard would fetch [1 + 10] days of data for a total of 330 days.

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Ahmed Hassan <[email protected]>

harry671003 · 2024-12-27T03:46:46Z

pkg/querier/tripperware/queryrange/query_range_middlewares.go

-		staticIntervalFn := func(_ tripperware.Request) time.Duration { return cfg.SplitQueriesByInterval }
-		queryRangeMiddleware = append(queryRangeMiddleware, tripperware.InstrumentMiddleware("split_by_interval", metrics), SplitByIntervalMiddleware(staticIntervalFn, limits, prometheusCodec, registerer))
+		intervalFn := func(_ tripperware.Request) time.Duration { return cfg.SplitQueriesByInterval }
+		if cfg.SplitQueriesByIntervalMaxSplits != 0 {


Shouldn't the limit be applied to both range splits and vertical spits?

cortex/pkg/querier/tripperware/shard_by.go

Line 40 in 8a46d20

func (s shardBy) Do(ctx context.Context, r Request) (Response, error) {

Technically this sets a limit for the total range and vertical splits for a given query. The number of vertical shards is static, so the max number of of splits for a given query becomes split_queries_by_interval_max_splits x query_vertical_shard_size. Because of this adding a separate limit for vertical sharding when the number of vertical shards is a static config would be redundant because we limit it already.

Signed-off-by: Ahmed Hassan <[email protected]>

pkg/querier/tripperware/queryrange/split_by_interval.go

yeya24 · 2024-12-31T19:55:30Z

Instead of changing split interval using max number of split queries, can we try to combine it with estimated data to fetch?

For example, a query up[30d] is very expensive to split to 30 splits as each split query still fetches 30 day of data so 30 splits ended up fetching 900 days of data.

Instead of having a limit of total splits should we use total days of data to fetch?

afhassan · 2024-12-31T23:59:47Z

Instead of changing split interval using max number of split queries, can we try to combine it with estimated data to fetch?

For example, a query up[30d] is very expensive to split to 30 splits as each split query still fetches 30 day of data so 30 splits ended up fetching 900 days of data.

Instead of having a limit of total splits should we use total days of data to fetch?

That's a good idea - I can add a new limit for total hours of data fetched and adjust the interval to not exceed it.

We can still keep max number of splits since it gives more flexibility to limit the number of shards for queries with long day range even if they don't fetch a lot of days of data like the example you mentioned

Signed-off-by: Ahmed Hassan <[email protected]>

docs/configuration/config-file-reference.md

pkg/querier/querier.go

pkg/querier/tripperware/queryrange/query_range_middlewares.go

pkg/querier/tripperware/queryrange/split_by_interval.go

afhassan · 2025-01-22T20:20:12Z

pkg/querier/tripperware/queryrange/split_by_interval.go

 	if err != nil {
 		return nil, httpgrpc.Errorf(http.StatusBadRequest, err.Error())
 	}
 	s.splitByCounter.Add(float64(len(reqs)))

+	stats := querier_stats.FromContext(ctx)


Yes it is only used to log the interval size used for splitting the query.

pkg/querier/tripperware/queryrange/split_by_interval.go

pkg/util/time.go

Signed-off-by: Ahmed Hassan <[email protected]>

yeya24 · 2025-01-20T09:38:57Z

I get the idea. But my main concern for such dynamic split interval + max splits by interval is that results cache will have very bad hit ratio as our current results cache key is tied to your split interval.

A 30 day range query is split to 30 queries using 24h interval
A 40 day range query is split to 20 queries using 48h interval

The first 30 day range query uses 24h interval so 24h will be part of our results cache key.
Now you run another 40 day range query with 48h interval, the results cache of the first 30 days will be missing as you are using 48h in your results cache key now.

Making vertical shard size dynamic seems more friendly to results cache because vertical shard size is not part of the results cache key. However, not all queries can be vertically sharded.

harry671003 · 2025-01-20T15:46:10Z

The first 30 day range query uses 24h interval so 24h will be part of our results cache key.
Now you run another 40 day range query with 48h interval, the results cache of the first 30 days will be missing as you are using 48h in your results cache key now.

Isn't this true today with Grafana modifying the step interval? For example the 30d query will have a step of 900s vs a 40d query will have a step of 1200s. Since the step is also in the cache key, this will already invalidate the cache.

I agree with you on changing the vertical shard size first. Could we mark this feature experimental and iterate on it?

yeya24 · 2025-01-21T18:35:07Z

Yeah let's mark it experimental in https://cortexmetrics.io/docs/configuration/v1guarantees/#experimental-features

Signed-off-by: Ahmed Hassan <[email protected]>

docs/configuration/config-file-reference.md

pkg/querier/tripperware/queryrange/dynamic_query_splits.go

harry671003 · 2025-01-23T16:57:43Z

pkg/querier/tripperware/queryrange/split_by_interval.go

+
+// calculates the total duration of data the query will have to fetch from storage as a multiple of baseInterval.
+// also returns the total time range fetched by the original query start and end times
+func durationFetchedByQuery(expr parser.Expr, req tripperware.Request, queryStoreAfter, lookbackDelta time.Duration, baseInterval time.Duration, now time.Time) (durationFetchedCount int, originalRangeCount int, lookbackDeltaCount int) {


Nit: Follow the convention for adding comments.

https://tip.golang.org/doc/comment

I rewrote the comment to be more clear. The convention seems to be general guidelines not a specific format but I tried to follow it as much as possible.

pkg/querier/tripperware/queryrange/split_by_interval.go

harry671003 · 2025-01-23T17:07:46Z

pkg/querier/tripperware/queryrange/split_by_interval_test.go

@@ -408,3 +413,189 @@ func Test_evaluateAtModifier(t *testing.T) {
 		})
 	}
 }
+
+func TestDynamicIntervalFn(t *testing.T) {


Could you also add separate tests for durationFetchedByQuery()?

Also, could you add more tests for split_by_interval with dynamic splits enabled? Maybe in TestSplitByDay()

I added tests for both and also added tests for getIntervalFromMaxSplits()

harry671003 · 2025-01-23T17:11:59Z

pkg/querier/tripperware/queryrange/split_by_interval.go

+	durationFetchedCount = 0
+	originalRangeCount = 0
+	lookbackDeltaCount = 0
+	baseIntervalMillis := util.DurationMilliseconds(baseInterval)


Is there a reason why the func returns as multiples of baseInterval rather than just a duration?
Since the function is called durationFetchedByQuery, I'd expect it to return a duration

The calculations done with these are all based on integers and rounding down is important for the result. I refactored the function to be more readable and changed its name. Let me know if you think we should make more changes to it.

pkg/querier/tripperware/queryrange/split_by_interval.go

Signed-off-by: Ahmed Hassan <[email protected]>

afhassan · 2025-01-28T00:43:03Z

Thanks @harry671003 for the all the feedback!
I think the PR is in a good place for a final review now

harry671003

Thanks for addressing the comments. LGTM

Signed-off-by: Ahmed Hassan <[email protected]>

docs/configuration/v1-guarantees.md

Signed-off-by: Ahmed Hassan <[email protected]>

yeya24 · 2025-01-28T23:20:40Z

pkg/querier/tripperware/queryrange/split_by_interval.go

+			n++
+		}
+	}
+	return n * baseInterval


This code is a bit confusing to me. I added a new test case in Test_getIntervalFromMaxSplits below. Since the total time range is only 23h so I expect it to be split by 1 day. But the result was split by 2 days. Please take a look

{ name: "23h with 10 max splits, expected to split by 1 day", baseSplitInterval: day, req: &tripperware.PrometheusRequest{ Start: 12 * 3600 * seconds, End: 35 * 3600 * seconds, Step: 5 * 60 * seconds, Query: "foo", }, maxSplits: 10, expectedInterval: 1 * day, },

You are right, this part should only run when maxSplits == 1.
The condition I had was not right. I changed it and added the test case for when query range < interval

if maxSplits == 1 { // No splitting, interval should be long enough to result in 1 split only nextSplitStart := nextIntervalBoundary(r.GetStart(), r.GetStep(), n*baseInterval) + r.GetStep() if nextSplitStart < r.GetEnd() { queryRangeWithoutFirstSplit := time.Duration((r.GetEnd() - nextSplitStart) * int64(time.Millisecond)) n += (queryRangeWithoutFirstSplit + baseInterval - 1) / baseInterval } }

yeya24 · 2025-01-28T23:21:46Z

pkg/querier/tripperware/queryrange/split_by_interval.go

+			extraIntervalsPerSplit = 1 // avoid division by 0
+		}
+
+		// Next analyze the query using the next split start time to find the additional duration fetched by lookbackDelta for other subsequent splits


I wonder if we can change some of those code below into dedicated functions. It is very hard to review now due to the complexity

I refactored it into smaller helper functions and changed some variable names to try and make it more readable

Signed-off-by: Ahmed Hassan <[email protected]>

yeya24 · 2025-02-08T05:45:37Z

pkg/querier/tripperware/queryrange/split_by_interval.go

 func getIntervalFromMaxSplits(r tripperware.Request, baseInterval time.Duration, maxSplitsInt int) time.Duration {
 	maxSplits := time.Duration(maxSplitsInt)
 	queryRange := time.Duration((r.GetEnd() - r.GetStart()) * int64(time.Millisecond))
-
-	// Calculate the multiple n of interval needed to shard query to <= maxSplits
+	// Calculate the multiple n of interval needed to shard query into <= maxSplits
 	n := (queryRange + baseInterval*maxSplits - 1) / (baseInterval * maxSplits)


It is not straight forward to understand this. Can you add some comments? Same for L227

I updated the comments. Below is how I would explain this function.

queryRange is divided by maxSplits to get the interval. The reason we divide by baseInterval as well is to get the multiple (n) that can be rounded up. This ensures the returned interval is a multiple of the base interval.

Adding baseInterval*maxSplits - 1 is to round up the calculated (n).

The loop is to handle cases where the first split is shorter than other splits. This happens because queries are split only at multiples of interval. So if the split interval is 8 days, the start time of the next interval can only be at timestamps of days 0, 8, 16, 24... So for a start time at day 6 the first shard will be 2 days long only. This is not accounted for in the calculation above which assumes the full query range is split into intervals of size 8.

The loop exits if, after removing the first split and recalculating (n), it is not larger than the one calculated before. If it is larger then the loop will keep incrementing n and recalculating until the correct one is found. The loop terminates when (n) interval is twice the query range because that interval is guaranteed to shard the query into 1 shard only.

This is an example 37 day range query from day 6 to 43 that should be split into a maximum of 5 splits.

For rounding up, the formula is

ceil(a/b)= (a + (b−1)) / b

Integer division in Go always rounds down. The idea of adding (b - 1) is that it guarantees that any small remainder (even 1 nanosecond) would round the result up one whole number. If there is no remainder at all then (b - 1) would get rounded down anyway.

This is where I originally found this method rust-lang/rfcs#2844

But I do agree that it is confusing with two variables in the denominator, so I replaced it with a ceilDiv(a, b) helper function.

Signed-off-by: Ahmed Hassan <[email protected]>

yeya24 · 2025-02-11T06:30:24Z

pkg/querier/tripperware/queryrange/split_by_interval.go

+	// First analyze the query using original start-end time. Duration fetched by lookbackDelta here only reflects the start time of first split
+	durationFetchedByRange, durationFetchedBySelectors, durationFetchedByLookbackDeltaFirstSplit := analyzeDurationFetchedByQueryExpr(expr, queryStart, queryEnd, baseInterval, lookbackDelta)
+
+	fixedDurationFetched += durationFetchedByRange        // Duration fetched by the query range is constant regardless of how many splits the query has


All split here means horizontal splits?

Yes, I try to refer to splits as horizontal splits only, while shards is the final total of splits x vertical shards.

Following up on this, if you were referring to perSplitDurationFetched, then it is actually for every shard too. We can calculate the total duration fetched after splitting while ignoring vertical sharding completely, and at the end the total duration fetched we have can be multiplied by vertical shards to get the final duration fetched.

Using the same logic, we start by dividing maxFetchedDataDurationPerQuery / time.Duration(queryVerticalShardSize), and the rest of the calculation can ignore vertical shard size.

Sorry for the confusion earlier.

yeya24

Looks good. But we are missing changelog here.

Signed-off-by: Ahmed Hassan <[email protected]>

pull-request-size bot added the size/S label Dec 24, 2024

harry671003 reviewed Dec 27, 2024

View reviewed changes

Change dynamic interval sharding to take into account vertical sharding

Loading
Loading status checks…

6106978

Signed-off-by: Ahmed Hassan <[email protected]>

pull-request-size bot added size/M and removed size/S labels Dec 31, 2024

afhassan commented Dec 31, 2024

View reviewed changes

pkg/querier/tripperware/queryrange/split_by_interval.go Outdated Show resolved Hide resolved

add dynamic sharding based on total days of data fetched for query

Loading
Loading status checks…

01c121c

Signed-off-by: Ahmed Hassan <[email protected]>

pull-request-size bot added size/L and removed size/M labels Jan 16, 2025

harry671003 reviewed Jan 17, 2025

View reviewed changes

add unit tests for dynamicIntervalFn

Loading
Loading status checks…

b15dde6

Signed-off-by: Ahmed Hassan <[email protected]>

allow using any base interval size for dynamicIntervalFn

Loading
Loading status checks…

a1e7a3a

Signed-off-by: Ahmed Hassan <[email protected]>

pull-request-size bot added size/XL and removed size/L labels Jan 22, 2025

add dynamic query splits to experimental features

Loading
Loading status checks…

08ec706

Signed-off-by: Ahmed Hassan <[email protected]>

afhassan marked this pull request as ready for review January 23, 2025 00:38

dosubot bot added the component/querier label Jan 23, 2025

rename dynamicIntervalFn unit tests

Loading
Loading status checks…

7349cfa

Signed-off-by: Ahmed Hassan <[email protected]>

harry671003 reviewed Jan 24, 2025

View reviewed changes

afhassan added 2 commits January 24, 2025 13:32

refactor dynamicIntervalFn to be more readable

Loading
Loading status checks…

3e95b45

Signed-off-by: Ahmed Hassan <[email protected]>

add unit tests for getIntervalFromMaxSplits

Loading
Loading status checks…

cfc5078

Signed-off-by: Ahmed Hassan <[email protected]>

afhassan changed the title ~~Add limit for max range query splits by interval~~ Add dynamic splitting by interval for range queries Jan 27, 2025

afhassan changed the title ~~Add dynamic splitting by interval for range queries~~ Dynamic splitting by interval for range queries Jan 27, 2025

add unit tests for analyzeDurationFetchedByQuery

Loading
Loading status checks…

83efc5c

Signed-off-by: Ahmed Hassan <[email protected]>

harry671003 approved these changes Jan 28, 2025

View reviewed changes

afhassan added 2 commits January 27, 2025 19:21

fix formatting

Loading
Loading status checks…

427f5b2

Signed-off-by: Ahmed Hassan <[email protected]>

update docs

Loading
Loading status checks…

c931aa6

Signed-off-by: Ahmed Hassan <[email protected]>

harry671003 reviewed Jan 28, 2025

View reviewed changes

docs/configuration/v1-guarantees.md Show resolved Hide resolved

update experimental features

Loading
Loading status checks…

c1ae047

Signed-off-by: Ahmed Hassan <[email protected]>

yeya24 reviewed Jan 28, 2025

View reviewed changes

afhassan added 3 commits January 31, 2025 09:59

fix dynamic splitting when query range is shorter than base interval

Loading
Loading status checks…

ba90bf0

Signed-off-by: Ahmed Hassan <[email protected]>

refactor dynamic query splitting into smaller helper functions

Loading
Loading status checks…

8385814

Signed-off-by: Ahmed Hassan <[email protected]>

use duration instead of int for dynamic query splitting calculation

Loading
Loading status checks…

9225d23

Signed-off-by: Ahmed Hassan <[email protected]>

yeya24 reviewed Feb 8, 2025

View reviewed changes

afhassan added 4 commits February 8, 2025 23:32

add comments for getIntervalFromMaxSplits

Loading
Loading status checks…

bea70ad

Signed-off-by: Ahmed Hassan <[email protected]>

add ceilDiv helper function

Loading
Loading status checks…

df492d8

Signed-off-by: Ahmed Hassan <[email protected]>

add default max splits by duration fetched

ce661ed

Signed-off-by: Ahmed Hassan <[email protected]>

add new unit tests for helper functions

Loading
Loading status checks…

701d393

Signed-off-by: Ahmed Hassan <[email protected]>

pull-request-size bot added size/XXL and removed size/XL labels Feb 11, 2025

yeya24 reviewed Feb 11, 2025

View reviewed changes

yeya24 approved these changes Feb 14, 2025

View reviewed changes

afhassan and others added 2 commits February 13, 2025 18:45

add changelog

Loading
Loading status checks…

2f17a99

Signed-off-by: Ahmed Hassan <[email protected]>

Merge branch 'master' into master

Loading
Loading status checks…

bb1c15b

Signed-off-by: Ahmed Hassan <[email protected]>

afhassan force-pushed the master branch from 18eff36 to bb1c15b Compare February 14, 2025 03:26

yeya24 merged commit ad49b2e into cortexproject:master Feb 14, 2025
17 checks passed

afhassan mentioned this pull request Feb 14, 2025

Add dynamic interval cache splitter #6592

Merged

3 tasks

Dynamic splitting by interval for range queries #6458

Dynamic splitting by interval for range queries #6458

Conversation

afhassan commented Dec 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

afhassan Dec 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yeya24 commented Dec 31, 2024

Uh oh!

afhassan commented Dec 31, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yeya24 commented Jan 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

harry671003 commented Jan 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yeya24 commented Jan 21, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

afhassan commented Jan 28, 2025

Uh oh!

harry671003 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yeya24 Jan 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

afhassan commented Dec 24, 2024 •

edited

Loading

afhassan Dec 30, 2024 •

edited

Loading

yeya24 commented Jan 20, 2025 •

edited

Loading

harry671003 commented Jan 20, 2025 •

edited

Loading

yeya24 Jan 28, 2025 •

edited

Loading

afhassan Feb 9, 2025 •

edited

Loading