gateway: reduce potential lock contention in gateway forwarder by jsternberg · Pull Request #6741 · moby/buildkit

jsternberg · 2026-05-05T15:48:55Z

There's a large potential for a lock contention issue in the gateway
forwarder's logic. The previous iteration of this would keep a global
mapping of the build ids and, when a forwarder for a build id didn't
exist, the forwarder would wait 3 seconds for the build to register.

The issue with lock contention comes after this. Instead of having a
notification channel that a specific build was ready, the forwarder
would wake up all goroutines that were waiting each time a build was
registered. Since each of those builds took a read lock to check whether
its build was present and registering subsequent builds took a write
lock, it was very easy to end up in a lock contention scenario when
starting many builds at the same time. Then it was easy to hit the 3
second timeout especially when the machine itself was under load.

This changes the notification mechanism so the notify happens per build.
Looking up a build id creates a forwarder registrar with a channel that
can be polled for when the registration is complete. A forwarder will
then only be notified and woken when that specific build id is ready by
the go runtime rather than from the sync condition.

Potentially alleviates how often #5171 will happen.

tonistiigi

I think this should be done with a generics-based utility.

jsternberg · 2026-05-06T16:28:32Z

@tonistiigi any ideas on a good name for the part you want me to split out? I'm attempting to look at splitting it out but I'd be removing almost the entirety of the gateway forwarder. Maybe we can defer splitting this out until it's needed somewhere else?

tonistiigi · 2026-05-06T17:06:39Z

@jsternberg Smth like util/registrar.

Maybe we can defer splitting this out until it's needed somewhere else?

I think it would still be much cleaner with these separation, but if you want, you can leave the generic mechanism private for now instead of adding public pkg for it(although I think the session registration is probably a very similar mechanism that we could look in a follow-up).

There's a large potential for a lock contention issue in the gateway forwarder's logic. The previous iteration of this would keep a global mapping of the build ids and, when a forwarder for a build id didn't exist, the forwarder would wait 3 seconds for the build to register. The issue with lock contention comes after this. Instead of having a notification channel that a specific build was ready, the forwarder would wake up all goroutines that were waiting each time a build was registered. Since each of those builds took a read lock to check whether its build was present and registering subsequent builds took a write lock, it was very easy to end up in a lock contention scenario when starting many builds at the same time. Then it was easy to hit the 3 second timeout especially when the machine itself was under load. This changes the notification mechanism so the notify happens per build. Looking up a build id creates a forwarder registrar with a channel that can be polled for when the registration is complete. A forwarder will then only be notified and woken when that specific build id is ready by the go runtime rather than from the sync condition. Signed-off-by: Jonathan A. Sternberg <jonathan.sternberg@docker.com>

jsternberg · 2026-05-06T19:24:24Z

Thanks for the clarification. I've broken out the logic into its own package.

tonistiigi · 2026-05-06T19:32:45Z

+		select {
+		case <-reg.notifyCh:
+			return
+		case <-time.After(3 * time.Second):


Shouldn't this be just passed via Get(ctx) with context.WithTimeout()?

I don't think this is generally needed anymore. I think the time.After() ends up being easier to track and also automatically cleans itself up. I also chose to make it so the timer only gets started if the Get call is the reason why the registration is created. If Register happens first, no timer gets created. I chose the outer section (the part that doesn't run in a goroutine) to only consider the passed in context just in case the grpc call got canceled but the timeout is only contained in the spawned goroutine and only starts after the registrar is created and is waiting. This also prevents the timer from inadvertently waiting on a busy global lock.

github-actions Bot added the area/solver label May 5, 2026

github-actions Bot assigned jsternberg May 5, 2026

jsternberg mentioned this pull request May 5, 2026

sporadic "forwarding Ping: no such job" errors in CI #5171

Open

jsternberg force-pushed the forwarding-ping-no-job-id branch from d8f1eb7 to 686d666 Compare May 5, 2026 20:05

tonistiigi reviewed May 6, 2026

View reviewed changes

jsternberg force-pushed the forwarding-ping-no-job-id branch from 686d666 to 4b9488b Compare May 6, 2026 19:21

github-actions Bot added the area/util label May 6, 2026

tonistiigi reviewed May 6, 2026

View reviewed changes

tonistiigi approved these changes May 6, 2026

View reviewed changes

tonistiigi merged commit 5dc04eb into moby:master May 6, 2026
239 of 241 checks passed

jsternberg deleted the forwarding-ping-no-job-id branch May 6, 2026 20:58

jsternberg mentioned this pull request May 8, 2026

docker driver loses BuildKit job for large bake target sets docker/buildx#3810

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gateway: reduce potential lock contention in gateway forwarder#6741

gateway: reduce potential lock contention in gateway forwarder#6741
tonistiigi merged 1 commit into
moby:masterfrom
jsternberg:forwarding-ping-no-job-id

jsternberg commented May 5, 2026

Uh oh!

tonistiigi left a comment

Uh oh!

jsternberg commented May 6, 2026

Uh oh!

tonistiigi commented May 6, 2026

Uh oh!

jsternberg commented May 6, 2026

Uh oh!

tonistiigi May 6, 2026

Uh oh!

jsternberg May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jsternberg commented May 5, 2026

Uh oh!

tonistiigi left a comment

Choose a reason for hiding this comment

Uh oh!

jsternberg commented May 6, 2026

Uh oh!

tonistiigi commented May 6, 2026

Uh oh!

jsternberg commented May 6, 2026

Uh oh!

tonistiigi May 6, 2026

Choose a reason for hiding this comment

Uh oh!

jsternberg May 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants