feat: coordinator-side reconciliation for partial-load rules#19461
feat: coordinator-side reconciliation for partial-load rules#19461clintropolis wants to merge 8 commits into
Conversation
changes: * adds selective inventory wrapping with `DataSegmentAndLoadProfile` which extends `DataSegment`; full-load slots in `DruidDataSource` / `ImmutableDruidDataSource` stay bare, only partial-loaded slots allocate a wrapper * `PartialLoadProfile` records routed through a weak interner so identical profiles across replicas share by reference (the `wrappedLoadSpec` map is the bulk of per-replica heap). * `SegmentChangeRequestLoad.forAnnouncement(segment)` detects a `partialProjection` wrapper on the segment's load spec and stamps the wrapper's fingerprint plus full segment size as `loadedBytes` — the coordinator reads this as a full-fallback profile, so partial-load rules work end-to-end (modulo no real partial loading yet) without thrashing * `BatchDataSegmentAnnouncer` calls new method `forAnnouncement` on all three announcement paths to create the proper announcement depending on partial load profile * fingerprint-aware partial reconciler in `StrategicSegmentAssigner`; classifies replicas (matching/stale, loaded/in-flight) via new `PartialSegmentStatusInTier`, applies the load-then-drop swap (stale stays serving until matching has actually loaded), cancels stale in-flight loads and re-targets the freed slots, allows additive reload on stale-loaded servers to mitigate the no-spare-server stuck state * `ImmutableDruidServer.getPartialLoadProfile`, `DruidServer.addDataSegment(segment, profile)` with `loadedBytes`-aware `currSize` accounting, `HttpServerInventoryView` threads announcement-side profile through * in-flight profile tracking on `ServerHolder.startOperation(action, segment, profile)` and `SegmentHolder.getProfile()` so the reconciler can read in-flight fingerprints. * new coordinator stats - `PARTIAL_ASSIGNED`, `PARTIAL_STALE_DROPPED`, `PARTIAL_STALE_CANCELLED` * removed `SegmentLoadingCapabilities.supportsPartialLoad` (now unused — per-replica `isFullFallback` is the better signal); coordinator always issues partial-load requests, historical handles via full-fallback until real partial loading lands * tests: `StrategicSegmentAssignerPartialTest` (reconciler scenarios), `DruidServerPartialLoadTest` (size accounting), `PartialLoadProfileTest` (interning), `SegmentChangeRequestLoadTest` (forAnnouncement), updates across affected fixtures
capistrant
left a comment
There was a problem hiding this comment.
mostly minor comments on first pass. Still have to review DruidServerPartialLoadTest and StrategicSegmentAssignerPartialTest in more detail
… more extensible, other adjustments
FrankChen021
left a comment
There was a problem hiding this comment.
| Severity | Findings |
|---|---|
| P0 | 0 |
| P1 | 1 |
| P2 | 1 |
| P3 | 0 |
| Total | 2 |
Reviewed 28 of 28 changed files.
This is an automated review by Codex GPT-5.5
| ); | ||
| destinations.addAll(status.getEligibleForFreshLoad()); | ||
| destinations.addAll(canceledStaleServers); | ||
| destinations.addAll(status.getEligibleForAdditiveReload()); |
There was a problem hiding this comment.
[P1] Additive reloads cannot update stale loaded replicas
The new fallback queues partial loads back onto stale-loaded servers, but same-id loads do not produce a fresh inventory profile: BatchDataSegmentAnnouncer skips announceSegment when the segment is already announced, and HttpServerInventoryView ignores addSegment for an id already present. In a saturated tier with only stale replicas, the load can report success while the coordinator never sees the new fingerprint, so the replica remains stale and is requeued every run. Either make same-server reloads update the announcement/inventory profile, or avoid this additive path until the historical/inventory lifecycle can actually replace the profile.
There was a problem hiding this comment.
^ seems like a legitimate concern
There was a problem hiding this comment.
this part will come as an enhancement when partial loads are actually wired up on the historical, we will check the fingerprint of the updated request against the existing loaded fingerprint and load the missing parts. There are no partial loads right now so we always fall back to full loads, which i think should never be stale.
There was a problem hiding this comment.
Oh i see, yea, i guess there will be a temporary problem. I think I can put a fix in until the actual functionality is there.
| private final boolean supportsPartialLoad; | ||
|
|
||
| @JsonCreator | ||
| public SegmentLoadingCapabilities( |
There was a problem hiding this comment.
[P2] Keep a capability gate for older historicals
After removing supportsPartialLoad, the coordinator has no way to distinguish historicals that can deserialize partialProjection load specs from older nodes during a rolling upgrade. replicateSegmentPartially now sends wrapped load specs to every eligible server, so an older historical will fail the load instead of doing the intended full fallback. Please preserve a capability bit that defaults false for old/missing endpoints and degrade those destinations to regular full loads until support is advertised.
There was a problem hiding this comment.
I believe the PR description calls out why this is being removed with a proper justification.
There was a problem hiding this comment.
To elaborate a bit more on why i removed supportsPartialLoad , with it existing and the coordinator considering it for assignment, when actually in this scenario it creates hot spots whenever you roll between versions; like if only a single server in a tier supports partial loads then all partial loads are assigned to the single server. This seemed like more problems than benefits.
With regards to older versions of historicals not understanding the new partial load specs, without the flag existing I plan to just document this as new functionality that should not be used until an operator is certain that there is no need to roll back, the same we do with certain types of other new features (like segment format stuff).
capistrant
left a comment
There was a problem hiding this comment.
latest changes look good to me too. good catch
Description
Follow-up to #19409, this PR enhances the coordintor side stuff so that partial replication is wired up and managed, taking the partial load size and rule fingerprints into consideration when doing assignment and balancing. A lot of the new code is similar to existing functionality of
StrategicSegmentAssigner, but separate to minimize the chance of impacting existing code paths.changes:
DataSegmentAndLoadProfilewhich extendsDataSegment; full-load slots inDruidDataSource/ImmutableDruidDataSourcestay bare, only partial-loaded slots allocate a wrapperPartialLoadProfilerecords routed through a weak interner so identical profiles across replicas share by reference (thewrappedLoadSpecmap is the bulk of per-replica heap).SegmentChangeRequestLoad.forAnnouncement(segment)detects apartialProjectionwrapper on the segment's load spec and stamps the wrapper's fingerprint plus full segment size asloadedBytes; the coordinator reads this as a full-fallback profile, so partial-load rules work end-to-end (modulo no real partial loading yet) without thrashingBatchDataSegmentAnnouncercalls new methodforAnnouncementon all three announcement paths to create the proper announcement depending on partial load profileStrategicSegmentAssigner; classifies replicas (matching/stale, loaded/in-flight) via newPartialSegmentStatusInTier, applies the load-then-drop swap (stale stays serving until matching has actually loaded), cancels stale in-flight loads and re-targets the freed slots, allows additive reload on stale-loaded servers to mitigate the no-spare-server stuck stateImmutableDruidServer.getPartialLoadProfile,DruidServer.addDataSegment(segment, profile)withloadedBytes-awarecurrSizeaccounting,HttpServerInventoryViewthreads announcement-side profile throughServerHolder.startOperation(action, segment, profile)andSegmentHolder.getProfile()so the reconciler can read in-flight fingerprints.PARTIAL_ASSIGNED,PARTIAL_STALE_DROPPED,PARTIAL_STALE_CANCELLEDSegmentLoadingCapabilities.supportsPartialLoad(now unused; per-replicaisFullFallbackis the better signal); coordinator always issues partial-load requests, historical handles via full-fallback until real partial loading landsStrategicSegmentAssignerPartialTest(reconciler scenarios),DruidServerPartialLoadTest(size accounting),PartialLoadProfileTest(interning),SegmentChangeRequestLoadTest(forAnnouncement), updates across affected fixtures