[PLAT-440] roll up routing in metrics view by pjain1 · Pull Request #9180 · rilldata/rill

pjain1 · 2026-04-03T13:15:16Z

Add rollup table config to metrics view proto and YAML parser
Implement query routing: eligible rollups are selected based on grain derivability, dimension/measure coverage, timezone match, time range alignment, and time coverage
Prefer coarsest grain among eligible rollups; break ties by smallest data range
For no-time-range queries ("all data"), verify the rollup covers the base table's full range rather than skipping coverage checks

Checklist:

Covered by tests
Ran it and it works as intended
Reviewed the diff before requesting a review
Checked for unhandled edge cases
Linked the issues it closes
Checked if the docs need to be updated. If so, create a separate Linear DOCS issue
Intend to cherry-pick into the release branch
I'm proud of this work!

runtime/metricsview/executor/watermark_cache.go

proto/rill/runtime/v1/resources.proto

proto/rill/runtime/v1/time_grain.proto

runtime/parser/parse_metrics_view.go

runtime/metricsview/executor/executor_validate.go

runtime/metricsview/executor/watermark_cache.go

begelundmuller · 2026-04-14T12:05:37Z

proto/rill/runtime/v1/resources.proto

+    // Time grain of the rollup. If unspecified, defaults to the base metrics view's smallest_time_grain during validation.
+    TimeGrain time_grain = 5;


What is the reasoning for this default? I'm wondering if it would be safer to make it required. Worried about three problems:

If it is misconfigured, you will get broken/empty results (fails silently instead of with a clear error)

If not explicitly set, the smallest_time_grain defaults to seconds by default, which is unlikely to be used for rollups

smallest_time_grain is supposed to indicate the granularity of the raw table; but won't the rollup tables usually have a different granularity than the raw table?

We can add this default later if needed, but we cannot easily make it required later. So might be safer to make it required now.

Its required and does not default to anything, the comment is out of date, see this in parse_metrics_view.go

if rollup.TimeGrain == "" { return fmt.Errorf(`rollup[%d]: "time_grain" is required`, i) }

will fix the comment.

begelundmuller · 2026-04-14T12:19:04Z

runtime/metricsview/executor/executor.go

-	priority    int
+	rt              *runtime.Runtime
+	instanceID      string
+	metricsViewName string


The reason we haven't passed the name in before is because executor is supposed to be isolated from the catalog/resolver calls. This has two advantages:

It keeps logic clean/flattened, avoiding e.g. resolvers calling an executor that calls a resolver

It enables the executor to safely be used for specs that are not valid in the catalog, which currently the metrics view reconciler uses to validate new specs

I see this is being used to call a resolver for time ranges. I understand that in practice it works since it avoids circular code paths, but given how complex this package is, I think we should try to stick to the above principles about the package being isolated to avoid circular code.

We had a similar problem earlier for effectively resolving time expressions. At that time, we introduced the BindQuery function, which solved the problem in a clean way:

If the query is not bound, the executor directly fetches the timestamps (which ensures correctness, but doesn't have any caching)

The caller can optionally bind the query using cached timestamps to speed things up. See an example here – it works out quite cleanly in practice:

rill/runtime/queries/metricsview_aggregation.go

Line 143 in 1b982b3

e, err := executor.New(ctx, rt, instanceID, q.MetricsViewName, mv.ValidSpec, mv.Streaming, security, opts.Priority, userAttrs)

Could we do something similar here? Or maybe just extend BindQuery to bind a full set of timestamps for all the base/rollup tables?

ok let me have a look, I thought since we are already passing mv spec then passing its name should be fine but I get your point.

begelundmuller · 2026-04-14T12:22:20Z