Skip to content

feat: coordinator-side reconciliation for partial-load rules#19461

Open
clintropolis wants to merge 8 commits into
apache:masterfrom
clintropolis:partial-load-rule-coordinator-stuff-pt-2
Open

feat: coordinator-side reconciliation for partial-load rules#19461
clintropolis wants to merge 8 commits into
apache:masterfrom
clintropolis:partial-load-rule-coordinator-stuff-pt-2

Conversation

@clintropolis
Copy link
Copy Markdown
Member

@clintropolis clintropolis commented May 13, 2026

Description

Follow-up to #19409, this PR enhances the coordintor side stuff so that partial replication is wired up and managed, taking the partial load size and rule fingerprints into consideration when doing assignment and balancing. A lot of the new code is similar to existing functionality of StrategicSegmentAssigner, but separate to minimize the chance of impacting existing code paths.

changes:

  • adds selective inventory wrapping with DataSegmentAndLoadProfile which extends DataSegment; full-load slots in DruidDataSource / ImmutableDruidDataSource stay bare, only partial-loaded slots allocate a wrapper
  • PartialLoadProfile records routed through a weak interner so identical profiles across replicas share by reference (the wrappedLoadSpec map is the bulk of per-replica heap).
  • SegmentChangeRequestLoad.forAnnouncement(segment) detects a partialProjection wrapper on the segment's load spec and stamps the wrapper's fingerprint plus full segment size as loadedBytes; the coordinator reads this as a full-fallback profile, so partial-load rules work end-to-end (modulo no real partial loading yet) without thrashing
  • BatchDataSegmentAnnouncer calls new method forAnnouncement on all three announcement paths to create the proper announcement depending on partial load profile
  • fingerprint-aware partial reconciler in StrategicSegmentAssigner; classifies replicas (matching/stale, loaded/in-flight) via new PartialSegmentStatusInTier, applies the load-then-drop swap (stale stays serving until matching has actually loaded), cancels stale in-flight loads and re-targets the freed slots, allows additive reload on stale-loaded servers to mitigate the no-spare-server stuck state
  • ImmutableDruidServer.getPartialLoadProfile, DruidServer.addDataSegment(segment, profile) with loadedBytes-aware currSize accounting, HttpServerInventoryView threads announcement-side profile through
  • in-flight profile tracking on ServerHolder.startOperation(action, segment, profile) and SegmentHolder.getProfile() so the reconciler can read in-flight fingerprints.
  • new coordinator stats - PARTIAL_ASSIGNED, PARTIAL_STALE_DROPPED, PARTIAL_STALE_CANCELLED
  • removed SegmentLoadingCapabilities.supportsPartialLoad (now unused; per-replica isFullFallback is the better signal); coordinator always issues partial-load requests, historical handles via full-fallback until real partial loading lands
  • tests: StrategicSegmentAssignerPartialTest (reconciler scenarios), DruidServerPartialLoadTest (size accounting), PartialLoadProfileTest (interning), SegmentChangeRequestLoadTest (forAnnouncement), updates across affected fixtures

changes:
* adds selective inventory wrapping with `DataSegmentAndLoadProfile` which extends `DataSegment`; full-load slots in `DruidDataSource` / `ImmutableDruidDataSource` stay bare, only partial-loaded slots allocate a wrapper
* `PartialLoadProfile` records routed through a weak interner so identical profiles across replicas share by reference (the `wrappedLoadSpec` map is the bulk of per-replica heap).
* `SegmentChangeRequestLoad.forAnnouncement(segment)` detects a `partialProjection` wrapper on the segment's load spec and stamps the wrapper's fingerprint plus full segment size as `loadedBytes` — the coordinator reads this as a full-fallback profile, so partial-load rules work end-to-end (modulo no real partial loading yet) without thrashing
* `BatchDataSegmentAnnouncer` calls new method `forAnnouncement` on all three announcement paths to create the proper announcement depending on partial load profile
* fingerprint-aware partial reconciler in `StrategicSegmentAssigner`; classifies replicas (matching/stale, loaded/in-flight) via new `PartialSegmentStatusInTier`, applies the load-then-drop swap (stale stays serving until matching has actually loaded), cancels stale in-flight loads and re-targets the freed slots, allows additive reload on stale-loaded servers to mitigate the no-spare-server stuck state
* `ImmutableDruidServer.getPartialLoadProfile`, `DruidServer.addDataSegment(segment, profile)` with `loadedBytes`-aware `currSize` accounting, `HttpServerInventoryView` threads announcement-side profile through
* in-flight profile tracking on `ServerHolder.startOperation(action, segment, profile)` and `SegmentHolder.getProfile()` so the reconciler can read in-flight fingerprints.
* new coordinator stats - `PARTIAL_ASSIGNED`, `PARTIAL_STALE_DROPPED`, `PARTIAL_STALE_CANCELLED`
* removed `SegmentLoadingCapabilities.supportsPartialLoad` (now unused — per-replica `isFullFallback` is the better signal); coordinator always issues partial-load requests, historical handles via full-fallback until real partial loading lands
* tests: `StrategicSegmentAssignerPartialTest` (reconciler scenarios), `DruidServerPartialLoadTest` (size accounting), `PartialLoadProfileTest` (interning), `SegmentChangeRequestLoadTest` (forAnnouncement), updates across affected fixtures
Copy link
Copy Markdown
Contributor

@capistrant capistrant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly minor comments on first pass. Still have to review DruidServerPartialLoadTest and StrategicSegmentAssignerPartialTest in more detail

Comment thread server/src/main/java/org/apache/druid/server/coordinator/ServerHolder.java Outdated
Comment thread server/src/main/java/org/apache/druid/client/HttpServerInventoryView.java Outdated
Copy link
Copy Markdown
Member

@FrankChen021 FrankChen021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Severity Findings
P0 0
P1 1
P2 1
P3 0
Total 2

Reviewed 28 of 28 changed files.


This is an automated review by Codex GPT-5.5

);
destinations.addAll(status.getEligibleForFreshLoad());
destinations.addAll(canceledStaleServers);
destinations.addAll(status.getEligibleForAdditiveReload());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Additive reloads cannot update stale loaded replicas

The new fallback queues partial loads back onto stale-loaded servers, but same-id loads do not produce a fresh inventory profile: BatchDataSegmentAnnouncer skips announceSegment when the segment is already announced, and HttpServerInventoryView ignores addSegment for an id already present. In a saturated tier with only stale replicas, the load can report success while the coordinator never sees the new fingerprint, so the replica remains stale and is requeued every run. Either make same-server reloads update the announcement/inventory profile, or avoid this additive path until the historical/inventory lifecycle can actually replace the profile.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ seems like a legitimate concern

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this part will come as an enhancement when partial loads are actually wired up on the historical, we will check the fingerprint of the updated request against the existing loaded fingerprint and load the missing parts. There are no partial loads right now so we always fall back to full loads, which i think should never be stale.

Copy link
Copy Markdown
Member Author

@clintropolis clintropolis May 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh i see, yea, i guess there will be a temporary problem. I think I can put a fix in until the actual functionality is there.

private final boolean supportsPartialLoad;

@JsonCreator
public SegmentLoadingCapabilities(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] Keep a capability gate for older historicals

After removing supportsPartialLoad, the coordinator has no way to distinguish historicals that can deserialize partialProjection load specs from older nodes during a rolling upgrade. replicateSegmentPartially now sends wrapped load specs to every eligible server, so an older historical will fail the load instead of doing the intended full fallback. Please preserve a capability bit that defaults false for old/missing endpoints and degrade those destinations to regular full loads until support is advertised.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the PR description calls out why this is being removed with a proper justification.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To elaborate a bit more on why i removed supportsPartialLoad , with it existing and the coordinator considering it for assignment, when actually in this scenario it creates hot spots whenever you roll between versions; like if only a single server in a tier supports partial loads then all partial loads are assigned to the single server. This seemed like more problems than benefits.

With regards to older versions of historicals not understanding the new partial load specs, without the flag existing I plan to just document this as new functionality that should not be used until an operator is certain that there is no need to roll back, the same we do with certain types of other new features (like segment format stuff).

Copy link
Copy Markdown
Contributor

@capistrant capistrant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

latest changes look good to me too. good catch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants