[codex] Add query progress reporting by xiangfu0 · Pull Request #18649 · apache/pinot

xiangfu0 · 2026-06-02T04:32:19Z

Summary

Adds query progress reporting for long-running Pinot queries across the broker, controller, server, V1 execution, V2 execution, query console, and Pinot CLI.

The progress model reports processed work units over total work units. V1 uses server segment progress; V2 estimates work from multi-stage operators and stage execution progress. The controller exposes progress by clientQueryId, the query console polls it while a query is running, and the CLI renders adaptive progress: SSE/simple responses stay one compact line, while MSE responses can render aggregate progress plus labeled component rows.

User impact

Query console now shows numeric query progress while a query is in RUNNING state.
MSE progress responses can include labeled detail rows, and both Query Console and Pinot CLI render stacked progress bars when those details are present.
pinot-cli supports --progress-interval-ms and config key progress-interval-ms.
CLI progress is disabled with --progress-interval-ms=0 and is only rendered for interactive terminals, so redirected output/logs stay clean.
README includes usage notes and V1/V2 quickstart sample queries.

Screenshot

Query console progress while a V2 quickstart query is running:

Notes

The CLI injects a generated clientQueryId as a quoted query option so progress polling can correlate the client request with running query state.

For MSE, progress keeps missing or non-responsive workers as labeled unknown rows instead of dropping them from the aggregate denominator. That prevents a partially reported query from falsely showing 100% complete.

Validation

./mvnw -pl pinot-controller,pinot-clients/pinot-cli -am -DskipTests -DskipITs -Dmaven.javadoc.skip=true compile
./mvnw -pl pinot-broker -am -DskipTests -DskipITs -Dmaven.javadoc.skip=true compile
./mvnw -pl pinot-spi -Dtest=QueryProgressStatsTest test
./mvnw -pl pinot-query-runtime -am -Dtest=OpChainSchedulerServiceTest -Dsurefire.failIfNoSpecifiedTests=false test
./mvnw -pl pinot-core -am -Dtest=InstanceRequestHandlerTest -Dsurefire.failIfNoSpecifiedTests=false test
./mvnw -pl pinot-clients/pinot-cli -DskipTests -DskipITs -Dmaven.javadoc.skip=true package
spotless:apply, license:format, license:check, and checkstyle:check on affected modules
git diff --check
Local quickstart smoke test with query console and Pinot CLI progress query

Copilot

Pull request overview

Adds end-to-end query progress reporting for long-running Pinot queries, exposing a unified progress model (processed work units / total work units) across SSE (segment-based) and MSE (operator/stage-based) execution paths, and surfacing it via REST/gRPC, Query Console UI, and Pinot CLI.

Changes:

Introduces QueryProgressStats in pinot-spi, plus progress counters in QueryExecutionContext.
Implements progress tracking and retrieval across servers/brokers/controller (including new REST endpoints and a new gRPC Progress RPC for MSE).
Adds polling + rendering in Query Console and an interactive CLI progress line / progress bar.

Reviewed changes

Copilot reviewed 35 out of 35 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
pinot-spi/src/test/java/org/apache/pinot/spi/query/QueryProgressStatsTest.java	Adds unit tests for percent calculation, aggregation, JSON round-trip, and execution context accumulation.
pinot-spi/src/main/java/org/apache/pinot/spi/query/QueryProgressStats.java	New progress stats model with JSON support, aggregation, and derived percent.
pinot-spi/src/main/java/org/apache/pinot/spi/query/QueryExecutionContext.java	Adds atomic progress counters and APIs to mutate/read progress.
pinot-server/src/main/java/org/apache/pinot/server/api/resources/QueryResource.java	Adds server REST endpoint to fetch per-query progress and aggregate OFFLINE/REALTIME.
pinot-query-runtime/src/test/java/org/apache/pinot/query/runtime/executor/OpChainSchedulerServiceTest.java	Adds test coverage for progress tracking on completed op-chains.
pinot-query-runtime/src/main/java/org/apache/pinot/query/service/server/QueryServer.java	Adds gRPC `Progress` RPC handler for MSE worker progress.
pinot-query-runtime/src/main/java/org/apache/pinot/query/service/dispatch/QueryDispatcher.java	Adds broker-side dispatch logic to query MSE workers for progress and aggregate responses.
pinot-query-runtime/src/main/java/org/apache/pinot/query/service/dispatch/DispatchClient.java	Adds client call implementation for the new gRPC `progress` RPC.
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/QueryRunner.java	Exposes execution-context tracking and progress retrieval via `OpChainSchedulerService`.
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/plan/server/ServerPlanRequestUtils.java	Plumbs a shared `QueryExecutionContext` into leaf-stage `ServerQueryRequest`s for progress attribution.
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/executor/OpChainSchedulerService.java	Tracks execution contexts and increments processed work units on op-chain completion/failure.
pinot-core/src/test/java/org/apache/pinot/core/transport/InstanceRequestHandlerTest.java	Updates tests for renamed/cached execution-context retrieval API.
pinot-core/src/main/java/org/apache/pinot/core/transport/InstanceRequestHandler.java	Uses cached execution context and exposes server-side progress stats lookup.
pinot-core/src/main/java/org/apache/pinot/core/query/scheduler/QueryScheduler.java	Uses cached execution context when opening `QueryThreadContext`.
pinot-core/src/main/java/org/apache/pinot/core/query/request/ServerQueryRequest.java	Adds execution-context caching + setter to support shared context plumbing.
pinot-core/src/main/java/org/apache/pinot/core/query/executor/ServerQueryExecutorV1Impl.java	Adds total segment accounting to drive SSE progress denominators.
pinot-core/src/main/java/org/apache/pinot/core/operator/combine/SortedGroupByCombineOperator.java	Marks segments as processed during combine execution to advance progress.
pinot-core/src/main/java/org/apache/pinot/core/operator/combine/SequentialSortedGroupByCombineOperator.java	Marks segments as processed for sequential sorted group-by combine.
pinot-core/src/main/java/org/apache/pinot/core/operator/combine/MinMaxValueBasedSelectionOrderByCombineOperator.java	Marks segments as processed (including skipped segments) for progress accuracy.
pinot-core/src/main/java/org/apache/pinot/core/operator/combine/GroupByCombineOperator.java	Marks processed segments during group-by combine.
pinot-core/src/main/java/org/apache/pinot/core/operator/combine/BaseSingleBlockCombineOperator.java	Marks segments as processed when producing results blocks.
pinot-core/src/main/java/org/apache/pinot/core/operator/combine/BaseCombineOperator.java	Adds shared helper to increment processed-segment progress via thread context.
pinot-controller/src/main/resources/app/requests/index.ts	Adds Query Console API call for controller clientQueryId progress endpoint.
pinot-controller/src/main/resources/app/pages/Query.tsx	Adds clientQueryId injection, progress polling, and progress UI (numbers + bar).
pinot-controller/src/main/resources/app/Models.ts	Adds `QueryProgressStats` type to UI model definitions.
pinot-controller/src/main/java/org/apache/pinot/controller/api/resources/PinotRunningQueryResource.java	Adds controller REST endpoint to fetch progress by `clientQueryId` by polling brokers.
pinot-common/src/main/proto/worker.proto	Adds gRPC `Progress` RPC and request/response messages for MSE worker progress.
pinot-clients/pinot-cli/src/main/java/org/apache/pinot/cli/PinotCli.java	Adds CLI progress polling/rendering, config + flag, and clientQueryId injection.
pinot-clients/pinot-cli/README.md	Documents CLI/query-console progress behavior and usage examples.
pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/MultiStageBrokerRequestHandler.java	Tracks MSE execution contexts and aggregates broker+server progress for MSE queries.
pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/BrokerRequestHandlerDelegate.java	Routes broker progress requests to MSE handler first, then SSE handler.
pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/BrokerRequestHandler.java	Extends broker handler interface with `getQueryProgressStats(...)`.
pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/BaseSingleStageBrokerRequestHandler.java	Implements SSE progress retrieval by polling servers’ new progress endpoint.
pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/BaseBrokerRequestHandler.java	Adds default `getQueryProgressStats(...)` method stub + precondition for clientQueryId mapping.
pinot-broker/src/main/java/org/apache/pinot/broker/api/resources/PinotClientRequest.java	Adds broker REST endpoint to fetch progress by internal requestId or clientQueryId.

codecov-commenter · 2026-06-02T07:16:58Z

Codecov Report

❌ Patch coverage is 37.26415% with 266 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.44%. Comparing base (edfbf69) to head (5cb38b9).
⚠️ Report is 31 commits behind head on master.

Files with missing lines	Patch %	Lines
.../pinot/query/service/dispatch/QueryDispatcher.java	0.00%	53 Missing ⚠️
...oller/api/resources/PinotRunningQueryResource.java	0.00%	47 Missing ⚠️
...sthandler/BaseSingleStageBrokerRequestHandler.java	0.00%	38 Missing ⚠️
...pinot/broker/api/resources/PinotClientRequest.java	0.00%	27 Missing ⚠️
...ache/pinot/server/api/resources/QueryResource.java	0.00%	26 Missing ⚠️
...apache/pinot/query/service/server/QueryServer.java	16.66%	25 Missing ⚠️
...requesthandler/MultiStageBrokerRequestHandler.java	4.00%	24 Missing ⚠️
...r/requesthandler/BrokerRequestHandlerDelegate.java	0.00%	6 Missing ⚠️
...e/pinot/core/transport/InstanceRequestHandler.java	85.18%	2 Missing and 2 partials ⚠️
...e/pinot/query/service/dispatch/DispatchClient.java	0.00%	4 Missing ⚠️
... and 5 more

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #18649      +/-   ##
============================================
+ Coverage     64.39%   64.44%   +0.04%     
  Complexity     1291     1291              
============================================
  Files          3364     3372       +8     
  Lines        207935   208973    +1038     
  Branches      32467    32638     +171     
============================================
+ Hits         133906   134675     +769     
- Misses        63255    63499     +244     
- Partials      10774    10799      +25

Flag	Coverage Δ
custom-integration1	`100.00% <ø> (ø)`
integration	`100.00% <ø> (ø)`
integration1	`100.00% <ø> (ø)`
integration2	`0.00% <ø> (ø)`
java-21	`64.44% <37.26%> (+0.04%)`	⬆️
temurin	`64.44% <37.26%> (+0.04%)`	⬆️
unittests	`64.44% <37.26%> (+0.04%)`	⬆️
unittests1	`56.92% <61.90%> (+0.11%)`	⬆️
unittests2	`37.06% <7.07%> (-0.07%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

xiangfu0

Found one high-signal issue; see inline comment.

xiangfu0

Found 1 high-signal issue; see inline comment.

gortiz · 2026-06-04T09:44:43Z

I'll take a look today, but remember I also been working on #18458, which should help to produce more precise reports for MSE

gortiz · 2026-06-04T16:33:03Z

This is a really useful feature — having progress while a query runs is something users ask for constantly. A few thoughts on how it interacts with #18458 (SubmitWithStream bidi stats), which I think is worth addressing before merge since the two PRs modify some of the same infrastructure.

Merge conflict in OpChainSchedulerService

Both PRs add fields and lifecycle logic to OpChainSchedulerService. Concretely:

This PR adds:

_executionContextCache (Guava Cache<Long, QueryExecutionContext>, time-based eviction)
_completedProgressStatsCache (Guava Cache<Long, QueryProgressStats>, time-based eviction)
trackExecutionContext() / retainCompletedProgressStatsIfFinished() in the FutureCallback

#18458 adds:

_executionContextByRequest (ConcurrentMap<Long, QueryExecutionContext>, ref-counted)
_activeOpChainsByRequest (ConcurrentMap<Long, AtomicInteger>, reference counter)
decrementActiveOpChains() in the same FutureCallback
OpChainCompletionListener — a per-request callback that fires on op-chain completion with the full MultiStageQueryStats payload

The _executionContextByRequest in #18458 is a strictly better version of _executionContextCache here: it keeps the context alive for exactly as long as op-chains are running (ref-counted) rather than until a timer fires. A Guava time-based cache can either evict a live context prematurely (returning null for a running query) or retain a completed context longer than needed. The ref-counted map avoids both failure modes.

I'd suggest deferring this PR until after #18458 merges, then replacing _executionContextCache with _executionContextByRequest and building the progress lifecycle on top of OpChainCompletionListener — which brings me to the next point.

OpChainCompletionListener enables better MSE progress

The current MSE progress model counts op-chains as work units. That means progress only moves when an op-chain finishes. In a typical pipeline, leaf scan op-chains finish early while join and aggregation op-chains run for the full query duration:

time ───────────────────────────────────────────────────────────────►
  leaf-0:  ████░░░░░░░░░░░░░░░░░░░░░░░   finishes at ~30% of wall-clock
  leaf-1:  ██████░░░░░░░░░░░░░░░░░░░░░   finishes at ~40%
  join-0:  ░░░░░░████████████████████░   runs for nearly the whole query
  join-1:  ░░░░░░████████████████████░   runs for nearly the whole query
  agg-0:   ░░░░░░░░░░░░░░░░░░█████████   runs near the end

With op-chain counting: progress reads 2/5 = 40% for most of the query, then 5/5 = 100% in rapid succession. The bar sits still for the vast majority of the query duration.

OpChainCompletionListener (from #18458) fires with the actual MultiStageQueryStats — including rows scanned, CPU time, rows emitted. This opens up a much better model:

// At query start: use leaf segment count as the denominator (exact, known upfront)
ctx.addTotalWorkUnits(totalLeafSegments);

// In OpChainCompletionListener (fires per op-chain, with stats):
if (isLeafStage(opChainId)) {
    long rowsScanned = stats.get(LeafOperator.StatKey.NUM_DOCS_SCANNED);
    ctx.addProcessedWorkUnits(rowsScanned);
}
// Non-leaf op-chains don't contribute — they're bounded by what the leaves produce

This makes progress increase smoothly as leaf segments are scanned, which is both more accurate and more informative. It also naturally fixes the double-counting issue where addTotalSegmentsToProcess (in ServerQueryExecutorV1Impl) calls addTotalWorkUnits on the same context that QueryServer.submitInternal already called addTotalWorkUnits(opChainCount) on.

Rows-per-second as the primary signal (optional)

A related idea worth considering: rather than a percentage (which requires a reliable denominator), expose rowsPerSecond alongside processedRows. This is useful even when the total is unknown:

Scanning... 42.3M rows  |  1.2M rows/s  |  ~35s remaining

This is the model that both Trino and ClickHouse have converged on:

Trino CLI (StatusPrinter.java) computes rows/s and bytes/s from each polling response and displays them at every tick:
0:13 [6.45M rows, 560MB] [473K rows/s, 41.1MB/s] [=========>> ] 20%
The REST API does not have dedicated throughput fields — rates are derived client-side from processedRows / elapsedTimeMillis. No server changes were needed to add this.
ClickHouse HTTP interface streams Progress packets (read_rows, read_bytes, elapsed_ns) as the query executes, and clickhouse-client computes and displays:
Progress: 5.3M rows, 2.4GB (234K rows/s., 234MB/s.)
The server provides the raw counters; the rate is computed at the display layer.

Both approaches show that rows/s is valuable even without a perfect denominator. When a percentage is available (Trino has progressPercentage, ClickHouse has total_rows_to_read), it appears alongside the throughput; when not, the throughput alone is shown.

For Pinot, the simplest path is the Trino approach: add rowsProcessed and elapsedMs to QueryProgressStats, then compute rowsPerSecond in the CLI/UI from successive responses. No server changes needed for V1. When totalWorkUnits is known, ETA follows from (total - processed) / rowsPerSecond. When it isn't, rows/s alone tells the user whether the query is making progress and at what speed — arguably more actionable than a percentage built on a plan-cardinality estimate that may be off by an order of magnitude.

gortiz · 2026-06-04T16:33:38Z

The polling chain this PR introduces is functional but has a cost-multiplier property that becomes significant at scale. Flagging it here as a design discussion point rather than a blocker, since fixing it is a larger change that can be done in a follow-up.

The problem with polling

When a client calls GET /clientQuery/{id}/progress, the chain is:

client → controller (fan-out to all brokers)
       → broker     (fan-out HTTP to all servers, SSE; or gRPC Progress RPC, MSE)
       → servers    (Guava cache lookup, return JSON)

For a 30-second query with a 1-second poll interval and 3 servers:

30 client ticks
  → 30 × N controller→broker calls (N = brokers; fan-out to find the right one)
  → 30 × 3 broker→server calls
  = 90+ network calls whose only content is a tiny JSON payload

At 100 concurrent queries: ~9,000 extra calls/minute. Each tick also requires the server to have live progress state accessible at any time (the Guava caches in OpChainSchedulerService and InstanceRequestHandler). The caches are sized and timed to stay alive long enough to answer the next poll — which introduces the eviction races noted in other comments.

A push alternative

The natural fix is to flip the direction: client opens one persistent connection, broker pushes events as they arrive.

Client                  Broker                       Servers
  │                       │                             │
  │── GET /query/X/stream ►│                             │
  │  (SSE, stays open)    │                             │
  │                       │◄─ OpChainComplete (gRPC) ───│  ← already flowing via #18458
  │◄── data: {rows:12M} ──│                             │
  │                       │◄─ OpChainComplete (gRPC) ───│
  │◄── data: {rows:34M} ──│                             │
  │                       │◄─ OpChainComplete (gRPC) ───│
  │◄── data: {rows:100M} ─│                             │
  │◄── event: complete ───│  (stream closes)            │

Cost for the same 30-second query:

1 client connection (open once, reused throughout)
0 new controller→broker calls
0 new broker→server calls  ← #18458's SubmitWithStream already delivers this data

The broker SSE endpoint just fans out events it already holds in StreamingQuerySession. No additional Guava caches on servers. No eviction races. No controller fan-out per tick.

Why this is feasible with #18458 in place

#18458 introduces a long-lived gRPC bidi channel (SubmitWithStream) between broker and servers that stays open for the query duration. The broker's StreamingQuerySession already accumulates per-op-chain stats as they complete. The missing piece is an outbound channel from broker to client. SSE provides exactly that with standard JAX-RS (SseEventSink):

// In StreamingQuerySession, when OpChainComplete arrives (already called by #18458):
public void onOpChainComplete(...) {
    mergeStats(...);                     // existing #18458 logic
    broadcastProgressSnapshot();         // new: push to SSE subscribers
}

// New broker endpoint:
@GET @Path("query/{id}/progress/stream") @Produces(SERVER_SENT_EVENTS)
public void streamProgress(@PathParam("id") long queryId, @Context SseEventSink sink) {
    _queryDispatcher.subscribeProgressStream(queryId, sink);
}

Connection cleanup is handled automatically: when the SSE connection drops, sink.isClosed() returns true and the subscriber is removed on the next push attempt. When the query completes, the broker sends a final event with complete: true and closes the stream.

What this PR should do now

This is a non-trivial change that I wouldn't block the current PR on. But it would be worth:

Keeping the polling endpoint as-is (it's correct and useful for languages/clients that can't hold persistent connections)
Adding a GET /query/{id}/progress/stream SSE endpoint alongside it in a follow-up
Having the CLI and Query Console prefer SSE when available

The main thing to avoid is designing the server-side state (Guava caches, eviction timings) in a way that makes it hard to remove when the push path lands. The ref-counted _executionContextByRequest from #18458 is already the right shape for that.

xiangfu0 · 2026-06-04T23:43:00Z

Addressed the adaptive progress display path in the latest push.

SSE/simple progress responses still render as one compact CLI/UI row.
MSE progress can now carry labeled detail rows, so CLI and Query Console render a top-level aggregate plus component bars when details are present.
Missing or non-responsive MSE workers are retained as unknown rows, so the aggregate no longer shrinks the denominator or falsely reaches 100%.

I kept the rows/s metric and broker-pushed progress stream as follow-up scope. The current shape keeps the aggregate fields backward compatible while allowing richer MSE status when the response includes details.

xiangfu0

Found one correctness issue; see inline comment.

xiangfu0 · 2026-06-05T12:13:29Z

+      }
+    }
+    if (!serverProgressStats.isEmpty()) {
+      return QueryProgressStats.aggregate(serverProgressStats);


This still returns a partial aggregate over only the SSE servers that replied with 200. If one targeted server times out or is temporarily unreachable, its unfinished work disappears from the denominator and the broker can report inflated progress, including 100%, even though the query is still blocked on that server. The MSE path now avoids that by treating missing servers as unknown progress; SSE needs the same treatment here (or retained last-known totals) instead of returning a partial aggregate.

xiangfu0 requested review from Jackie-Jiang and Copilot June 2, 2026 04:39

Copilot started reviewing on behalf of xiangfu0 June 2, 2026 04:39 View session

Copilot AI reviewed Jun 2, 2026

View reviewed changes

xiangfu0 force-pushed the codex/query-progress branch 3 times, most recently from 6cb4cc3 to d601f43 Compare June 2, 2026 06:25

xiangfu0 force-pushed the codex/query-progress branch from d601f43 to 1940943 Compare June 2, 2026 09:17

xiangfu0 marked this pull request as ready for review June 2, 2026 10:48

xiangfu0 commented Jun 2, 2026

View reviewed changes

Comment thread ...rc/main/java/org/apache/pinot/broker/requesthandler/BaseSingleStageBrokerRequestHandler.java

xiangfu0 force-pushed the codex/query-progress branch 2 times, most recently from 5dda15c to 44ae161 Compare June 2, 2026 20:22

xiangfu0 commented Jun 3, 2026

View reviewed changes

Comment thread pinot-query-runtime/src/main/java/org/apache/pinot/query/service/dispatch/QueryDispatcher.java Outdated

gortiz self-requested a review June 3, 2026 15:22

gortiz reviewed Jun 4, 2026

View reviewed changes

Comment thread pinot-spi/src/main/java/org/apache/pinot/spi/query/QueryProgressStats.java

Add query progress reporting

5cb38b9

xiangfu0 force-pushed the codex/query-progress branch from 44ae161 to 5cb38b9 Compare June 4, 2026 23:39

xiangfu0 commented Jun 5, 2026

View reviewed changes

Conversation

xiangfu0 commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

User impact

Screenshot

Notes

Validation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

xiangfu0 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

xiangfu0 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gortiz commented Jun 4, 2026

Uh oh!

Uh oh!

gortiz commented Jun 4, 2026

Uh oh!

gortiz commented Jun 4, 2026

Uh oh!

xiangfu0 commented Jun 4, 2026

Uh oh!

xiangfu0 left a comment

Choose a reason for hiding this comment

Uh oh!

xiangfu0 Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

xiangfu0 commented Jun 2, 2026 •

edited

Loading

codecov-commenter commented Jun 2, 2026 •

edited

Loading