feat: Separate Runtime Statistics Collection from UI Updates by kunwp1 · Pull Request #4205 · apache/texera

kunwp1 · 2026-02-10T20:56:59Z

What changes were proposed in this PR?

This PR introduce a new configuration parameter runtime-statistics-interval to independently control the frequency of runtime statistics persistence, separate from the UI update frequency (status-update-interval). Previously, both UI updates and runtime statistics persistence were controlled by a single parameter status-update-interval. This means
frequent UI updates (e.g., 500ms) caused excessive statistics writes to storage. This change allows independent control:

status-update-interval: Controls how often the frontend UI refreshes (default: 500ms)
runtime-statistics-interval: Controls how often statistics are persisted to storage (default: 2000ms)

Changes

Added runtime-statistics-interval parameter (default: 2000ms) in application.conf
Protobuf: Added StatisticsUpdateTarget enum (UI_ONLY, PERSISTENCE_ONLY, BOTH_UI_AND_PERSISTENCE) to QueryStatisticsRequest
Added RuntimeStatisticsPersist event for statistics-only updates; ExecutionStatsUpdate now handles UI-only updates
Added separate timer for runtime statistics collection that runs independently from UI update timer
Query Handling
- Timer-triggered queries specify target: UI-only or persistence-only
- Event-triggered queries (port/worker completion, pause, resume) send both UI and persistence updates to preserve original behavior
- QueryWorkerStatisticsHandler routes to appropriate event based on StatisticsUpdateTarget

Any related issues, documentation, discussions?

Closes #4204

How was this PR tested?

Tested with the following workflow and dataset, change the runtime-statistics-interval parameter to see if the runtime stats size reduces if we increase the parameter value.
Iris Dataset Analysis.json
Iris.csv

Was this PR authored or co-authored using generative AI tooling?

No.

Xiao-zhen-Liu

LGTM, left minor comments and some questions. Tested and can verify the size changes of the persisted runtime stats by adjusting this new parameter.

Xiao-zhen-Liu · 2026-02-11T18:18:11Z

common/config/src/main/resources/application.conf

    status-update-interval = 500
    status-update-interval = ${?CONSTANTS_STATUS_UPDATE_INTERVAL}
+
+    runtime-statistics-interval = 2000


It should be more explicit that this config is about persistence. Please add it in the name.

Xiao-zhen-Liu · 2026-02-11T18:31:19Z

...xera/amber/engine/architecture/controller/promisehandlers/QueryWorkerStatisticsHandler.scala

          Future.collect(futures).flatMap(_ => processLayers(rest))
      }

    // Start processing all layers and update the frontend after completion


Please also update this comment as it is not just the frontend.

Xiao-zhen-Liu · 2026-02-11T19:31:37Z

...in/scala/org/apache/texera/amber/engine/architecture/controller/ControllerTimerService.scala

          ControlInvocation(
            METHOD_CONTROLLER_INITIATE_QUERY_STATISTICS,
-            QueryStatisticsRequest(Seq.empty),
+            QueryStatisticsRequest(Seq.empty, updateTarget),


As you are having two separate timers that each send separate QueryStatisticsRequests, more requests will be sent than before. I am wondering what would be the implication of this? For example, would more frequent QueryStatistics be sent to each worker? It would be good if you can comment on this in your PR description.

Xiao-zhen-Liu · 2026-02-11T19:35:12Z

...xera/amber/engine/architecture/controller/promisehandlers/QueryWorkerStatisticsHandler.scala

  ): Future[EmptyReturn] = {
    // Avoid issuing concurrent full-graph statistics queries.
    // If a global query is already in progress, skip this request.
    if (globalQueryStatsOngoing && msg.filterByWorkers.isEmpty) {


The default configs of 500ms vs. 2000ms might result in each RuntimeStatisticsPersist coinciding with ExecutionStatsUpdate. Will all the RuntimeStatisticsPersist requests be discarded?

I wanted the runtime-statistics-interval to be independent from the existing status-update-interval and consider the case where runtime-statistics-interval is smaller than status-update-interval. But if we make an assumption that the RuntimeStatisticsPersist will always coincide with ExecutionStatsUpdate, the implementation would be much simpler and we can discard RuntimeStatisticsPersist.

You can decide what assumptions we make. I just wonder what the behavior of this part will be if the requests are independently sent.

kunwp1 added 2 commits February 10, 2026 12:45

Add new param

4dab632

Update application.conf

7c27789

kunwp1 requested a review from Xiao-zhen-Liu February 10, 2026 20:56

kunwp1 self-assigned this Feb 10, 2026

kunwp1 and others added 2 commits February 10, 2026 12:57

Merge branch 'main' into chris-introduce-new-interval-param

3226966

Remove unnecessary changes

c35f055

github-actions bot added engine common labels Feb 10, 2026

chenlica requested a review from aglinxinyuan February 11, 2026 14:59

Merge branch 'main' into chris-introduce-new-interval-param

723392f

Xiao-zhen-Liu approved these changes Feb 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Separate Runtime Statistics Collection from UI Updates#4205

feat: Separate Runtime Statistics Collection from UI Updates#4205
kunwp1 wants to merge 5 commits intoapache:mainfrom
kunwp1:chris-introduce-new-interval-param

kunwp1 commented Feb 10, 2026

Uh oh!

Xiao-zhen-Liu left a comment

Uh oh!

Xiao-zhen-Liu Feb 11, 2026

Uh oh!

Xiao-zhen-Liu Feb 11, 2026

Uh oh!

Xiao-zhen-Liu Feb 11, 2026

Uh oh!

Xiao-zhen-Liu Feb 11, 2026

Uh oh!

kunwp1 Feb 11, 2026

Uh oh!

Xiao-zhen-Liu Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kunwp1 commented Feb 10, 2026

What changes were proposed in this PR?

Changes

Any related issues, documentation, discussions?

How was this PR tested?

Was this PR authored or co-authored using generative AI tooling?

Uh oh!

Xiao-zhen-Liu left a comment

Choose a reason for hiding this comment

Uh oh!

Xiao-zhen-Liu Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Xiao-zhen-Liu Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Xiao-zhen-Liu Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Xiao-zhen-Liu Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

kunwp1 Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Xiao-zhen-Liu Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants