perf: Streamline s3 backend by jtuglu1 · Pull Request #19394 · apache/druid

jtuglu1 · 2026-04-30T21:42:54Z

Description

S3 achieve strong read-after-write consistency in 2020. The current s3 backend architecture assumes a prior consistency model and therefore does some redundant calls which are both slow and costly.

Some other things to look into in a separate PR is parallel download requests using byte ranges for a single file (currently we use a single TCP connection which isn't the fastest and is subject to S3 single-connection bandwidth limitations).

High-level list of changes

Removed isObjectInBucket guards before zip and gzip downloads (and the now-dead private method). The 404 from GetObject propagates as a SegmentLoadingException.
Replaced the doesObjectExist + listObjectsV2 two-call sequence with a single getObjectMetadata request in S3DataSegmentMover.

Release note

This PR has:

FrankChen021

Severity	Findings
P0	0
P1	0
P2	1
P3	0
Total	1

Reviewed 8 of 8 changed files.

This is an automated review by Codex GPT-5.5

FrankChen021 · 2026-05-12T14:00:18Z

+      sourceMetadata = s3Client.getObjectMetadata(s3Bucket, s3Path);
+    }
+    catch (S3Exception e) {
+      if (e.statusCode() == 404 && "NoSuchKey".equals(S3Utils.getS3ErrorCode(e))) {


[P2] Treat any HEAD 404 as a missing source

The idempotent already-moved path now only runs when headObject fails with status 404 and error code NoSuchKey. S3 HEAD failures for absent objects can surface as a generic 404/NotFound response, and other Druid S3 code handles missing HEAD results by status alone. In that case this code rethrows before checking whether the target object already exists, so a retry or competing move where the source was deleted after a successful copy can incorrectly fail instead of returning the moved segment. Please key this fallback off statusCode() == 404 rather than requiring NoSuchKey.

jtuglu1 force-pushed the optimize-s3-operations branch 3 times, most recently from 180aa0f to 4986d81 Compare May 1, 2026 16:52

jtuglu1 requested a review from clintropolis May 11, 2026 17:44

jtuglu1 marked this pull request as ready for review May 11, 2026 17:44

jtuglu1 requested review from gianm May 11, 2026 17:47

perf: Streamline s3 backend

50af918

jtuglu1 force-pushed the optimize-s3-operations branch from 4986d81 to 50af918 Compare May 11, 2026 18:07

FrankChen021 reviewed May 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Streamline s3 backend#19394

perf: Streamline s3 backend#19394
jtuglu1 wants to merge 1 commit into
apache:masterfrom
jtuglu1:optimize-s3-operations

jtuglu1 commented Apr 30, 2026 •

edited

Loading

Uh oh!

FrankChen021 left a comment

Uh oh!

FrankChen021 May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jtuglu1 commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

High-level list of changes

Release note

Uh oh!

FrankChen021 left a comment

Choose a reason for hiding this comment

Uh oh!

FrankChen021 May 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jtuglu1 commented Apr 30, 2026 •

edited

Loading