HttpRemoteTaskRunner enhancements by jtuglu1 · Pull Request #18851 · apache/druid

jtuglu1 · 2025-12-18T00:00:18Z

Description

Clone of #18729 but merged into current runner per @kfaraz request.

I've seen on the giant lock in HttpRemoteTaskRunner cause severe performance degradation under heavy load(200-500ms per acquisition with 1000s of activeTasks can slow down the startPendingTasks loop in TaskQueue). This leads to scheduling delays, which leads to more lag, which auto-scales more tasks, ..., etc. The runner also has a few (un)documented races abundant in the code. This overhead also slows down query tasks under load (e.g. MSQE and others) which utilize the scheduler for execution.

I'm attempting a rewrite of this class to optimize for throughput and safety.

Apart from the performance improvements/bug fixes, this will also include some new features:

Simpler code. The old task runner had old, legacy ZK references dangling around as well as a pretty complicated scheduling loop.

I would ultimately like to make this the default HttpRemoteTaskRunner and have it run in all tests/production clusters, etc. as I think that would help catch more bugs/issues.

Performance Testing

Test results thus far have shown ~100-300ms speed up per task runner operation (add(), etc.). Over 1000s of tasks, this amounts to minutes of delay saved.

Release note

Speed up throughput and improve thread safety of HttpRemoteTaskRunner

This PR has:

kfaraz · 2025-12-18T03:33:46Z

Thanks for creating this PR, @jtuglu1 ! The patch seems much simpler now.
I should be able to complete an initial review today.

...xing-service/src/main/java/org/apache/druid/indexing/overlord/hrtr/HttpRemoteTaskRunner.java

...-service/src/test/java/org/apache/druid/indexing/overlord/hrtr/HttpRemoteTaskRunnerTest.java

kfaraz

Leaving a partial review, will try to finish going through the rest of the changes today.

...xing-service/src/main/java/org/apache/druid/indexing/overlord/hrtr/HttpRemoteTaskRunner.java

kfaraz

Finished going through the bulk of the changes.

On the whole, the patch looks good. I have these major suggestions:

For the time being, it would be cleaner to use workerStateLock consistently whenever accessing the workers map. We can try to improve this later.
Avoid use of .forEach() and use .compute() instead, preferably encasing it in an addOrUpdate method similar to TaskQueue.
Do not perform any heavy operation like metadata store access, metric emission, listener notification, etc. inside the .compute() lambda.
Avoid throwing exceptions inside the lambda, if they are just to be caught back in the same method/loop. Instead, log an error and continue with the loop.
Remove the priority scheduling changes for now.
Reduce debug logging.

...xing-service/src/main/java/org/apache/druid/indexing/overlord/hrtr/HttpRemoteTaskRunner.java

kgyrtkirk · 2026-01-12T08:31:59Z

I will move this out from 36.0.0 for now - it doesn't seem like something which should block the release.
If it gets merged in the upcoming days it could still be ported over to the release branch - and thus be part of it!

...xing-service/src/main/java/org/apache/druid/indexing/overlord/hrtr/HttpRemoteTaskRunner.java

+                task.getType(),
+                HttpRemoteTaskRunnerWorkItem.State.PENDING
+            );
+            pendingTasks.offer(new PendingTaskQueueItem(task));


jtuglu1 · 2026-03-10T17:11:50Z

@gianm any thoughts here?

gianm · 2026-03-10T19:39:12Z

@gianm any thoughts here?

I will try to take a look. It may take some time to get to it, since the changes look quite extensive.

Have you run this on a real production at-scale cluster yet (something with hundreds or thousands of tasks running simultaneously, ideally)? If so, that's always helpful to know.

jtuglu1 · 2026-03-10T19:45:22Z

@gianm any thoughts here?

I will try to take a look. It may take some time to get to it, since the changes look quite extensive.

Have you run this on a real production at-scale cluster yet (something with hundreds or thousands of tasks running simultaneously, ideally)? If so, that's always helpful to know.

Yes, no observed issues. We run with close to 10k tasks at peak per cluster.

jtuglu1 · 2026-03-16T23:07:29Z

@kfaraz @gianm thoughts here?

jtuglu1 · 2026-03-19T01:03:16Z

@kfaraz @gianm any thoughts here?

...vice/src/main/java/org/apache/druid/indexing/overlord/hrtr/HttpRemoteTaskRunnerResource.java

gianm · 2026-03-11T03:20:30Z

...xing-service/src/main/java/org/apache/druid/indexing/overlord/hrtr/HttpRemoteTaskRunner.java


  // CAUTION: This method calls RemoteTaskRunnerWorkItem.setResult(..) which results in TaskQueue.notifyStatus() being called
-  // because that is attached by TaskQueue to task result future. So, this method must not be called with "statusLock"
+  // because that is attached by TaskQueue to task result future. So, this method must not be called with "workerStatusLock"


Should refer to workerStateLock?

gianm · 2026-03-20T20:09:50Z

...xing-service/src/main/java/org/apache/druid/indexing/overlord/hrtr/HttpRemoteTaskRunner.java

+        (key, taskEntry) -> {
+          if (taskEntry == null) {
+            // Try to find information about it in the TaskStorage
+            Optional<TaskStatus> knownStatusInStorage = taskStorage.getStatus(taskId);


This is going to need to do a metadata call while holding a (partial) lock on tasks. I see the old code did it under statusLock, and also there's a resolved conversation about keeping this here. It's fine to keep it here, I suppose, but please include a comment about how this does a metadata call and may cause contention on tasks.

I see the old code did it under statusLock, and also there's a resolved conversation about keeping this here. It's fine to keep it here, I suppose, but please include a comment about how this does a metadata call and may cause contention on tasks.

Yes, I can add a comment. The key point is this will only lock a (hopefully small depending on how ConcurrentHashmap determines the range size) subset of the task keys, allowing other tasks to continue their work.

gianm · 2026-03-20T20:46:02Z

...xing-service/src/main/java/org/apache/druid/indexing/overlord/hrtr/HttpRemoteTaskRunner.java

+        synchronized (workerStateLock) {
+          workerToAssign = findWorkerToRunTask(taskItem.getTask());
+
+          if (workerToAssign == null) {


It looks like this code will park and wait if the next task from pendingTasks can't be assigned. But there are situations where that task A can't be assigned, but another, later task B can be assigned. For example, if there is 1 free slot, and task A has requiredCapacity: 2 while task B has requiredCapacity: 1. Another example: if strong worker affinity is configured, and none of the affinity workers for task A are available, but affinity workers for task B are available.

The old logic would potentially iterate the entire pendingTaskIds looking for an assignable task, essentially allowing tasks to skip the line in case they required different capacity or different affinity workers. Please update the new logic to handle this case.

The old logic would potentially iterate the entire pendingTaskIds looking for an assignable task, essentially allowing tasks to skip the line in case they required different capacity or different affinity workers. Please update the new logic to handle this case.

Yes, I thought about this. The older logic was a bit cumbersome and hard to read and was overly-conservative (slow) in the locking behavior; would you be opposed to simply rescheduling this task? I was thinking of extending this to some sort of priority/backoff queue to address this problem.

What do you mean by "rescheduling this task"?

The thing I'm worried about is that we need line-skipping behavior. Especially with strong worker affinity, it's important for tasks to be able to skip the line. For example: a typical config would have a set of affinity workers for batch tasks and a set for realtime tasks. When the batch affinity workers are full we want to continue to assign realtime tasks to the realtime affinity workers.

So, if the solution does give tasks the ability to skip the line, it should be OK.

What do you mean by "rescheduling this task"?

Send the task to the back of the queue and effectively just filter through the queue until you can find a task that's runnable, or do a timed wait backoff if none are found after full iteration. This preserves FIFO ordering while still not causing HOL-blocking.

I think we can address FIFO behavior in a follow-up. That is, prioritizing tasks' priority in the queue based on Task::getPriority() for example.

Send the task to the back of the queue (effectively just filter through the queue until you can find a task that's runnable). This preserves FIFO ordering while still not causing HOL-blocking.

Sure, that's fine. But be careful to avoid a spin loop of rescheduling if no task is currently-schedulable.

gianm · 2026-03-20T20:49:10Z

...xing-service/src/main/java/org/apache/druid/indexing/overlord/hrtr/HttpRemoteTaskRunner.java

  @Override
  public void shutdown(String taskId, String reason)
  {
-    if (!lifecycleLock.awaitStarted(1, TimeUnit.SECONDS)) {


Why remove this guard?

github-actions bot added the Area - Ingestion label Dec 18, 2025

jtuglu1 changed the title ~~Http remote task runner revamp v2~~ HttpRemoteTaskRunner enhancements Dec 18, 2025

jtuglu1 force-pushed the http-remote-task-runner-revamp-v2 branch from d6dc9a2 to 6cc5303 Compare December 18, 2025 03:28

jtuglu1 force-pushed the http-remote-task-runner-revamp-v2 branch from 6cc5303 to 0deca3a Compare December 18, 2025 03:38

jtuglu1 commented Dec 18, 2025

View reviewed changes

...xing-service/src/main/java/org/apache/druid/indexing/overlord/hrtr/HttpRemoteTaskRunner.java Outdated Show resolved Hide resolved

github-advanced-security bot found potential problems Dec 18, 2025

View reviewed changes

...-service/src/test/java/org/apache/druid/indexing/overlord/hrtr/HttpRemoteTaskRunnerTest.java Fixed Show fixed Hide fixed

jtuglu1 requested a review from kfaraz December 18, 2025 10:12

jtuglu1 marked this pull request as ready for review December 18, 2025 17:12

jtuglu1 added this to the 36.0.0 milestone Dec 18, 2025

jtuglu1 force-pushed the http-remote-task-runner-revamp-v2 branch 2 times, most recently from 006a079 to f1b210a Compare December 21, 2025 19:45

kfaraz reviewed Dec 22, 2025

View reviewed changes

kfaraz reviewed Dec 25, 2025

View reviewed changes

kgyrtkirk modified the milestones: 36.0.0, 37.0.0 Jan 12, 2026

jtuglu1 force-pushed the http-remote-task-runner-revamp-v2 branch from f1b210a to 3237a49 Compare February 4, 2026 07:10

github-advanced-security bot found potential problems Feb 4, 2026

View reviewed changes

jtuglu1 added the Performance label Mar 6, 2026

jtuglu1 force-pushed the http-remote-task-runner-revamp-v2 branch 2 times, most recently from e2ca69f to 87d0be5 Compare March 6, 2026 03:33

jtuglu1 requested a review from kfaraz March 6, 2026 04:00

jtuglu1 force-pushed the http-remote-task-runner-revamp-v2 branch 3 times, most recently from 1ee0493 to 1c741ec Compare March 7, 2026 02:15

jtuglu1 requested a review from gianm March 9, 2026 19:40

jtuglu1 added 5 commits March 12, 2026 13:32

Enhance HttpRemoteTaskRunner

ea2a9f1

Update

1dc6f42

More updates

27f7e7c

Make logs more specific

6c8a863

Fix test

626a95c

jtuglu1 force-pushed the http-remote-task-runner-revamp-v2 branch from 4eaed68 to 626a95c Compare March 12, 2026 20:32

More fixes

80f7e1f

gianm reviewed Mar 20, 2026

View reviewed changes

Conversation

jtuglu1 commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Performance Testing

Release note

Uh oh!

kfaraz commented Dec 18, 2025

Uh oh!

Uh oh!

Uh oh!

kfaraz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kfaraz left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kgyrtkirk commented Jan 12, 2026

Uh oh!

Check notice

jtuglu1 commented Mar 10, 2026

Uh oh!

gianm commented Mar 10, 2026

Uh oh!

jtuglu1 commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jtuglu1 commented Mar 16, 2026

Uh oh!

jtuglu1 commented Mar 19, 2026

Uh oh!

Uh oh!

Uh oh!

gianm Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

gianm Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

jtuglu1 Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gianm Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

jtuglu1 Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gianm Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jtuglu1 Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jtuglu1 Mar 20, 2026

jtuglu1 commented Dec 18, 2025 •

edited

Loading

kfaraz left a comment •

edited

Loading

jtuglu1 commented Mar 10, 2026 •

edited

Loading

jtuglu1 Mar 20, 2026 •

edited

Loading

jtuglu1 Mar 20, 2026 •

edited

Loading

gianm Mar 20, 2026 •

edited

Loading

jtuglu1 Mar 20, 2026 •

edited

Loading