fix: adjust diskNormalized strategy to scale cost exponentially with disk utilization#19422
fix: adjust diskNormalized strategy to scale cost exponentially with disk utilization#19422jtuglu1 wants to merge 1 commit into
Conversation
b3521f9 to
9064f7e
Compare
…sed on disk utilization
9064f7e to
fe0d0d3
Compare
FrankChen021
left a comment
There was a problem hiding this comment.
| Severity | Findings |
|---|---|
| P0 | 0 |
| P1 | 1 |
| P2 | 1 |
| P3 | 0 |
| Total | 2 |
This is an automated review by Codex GPT-5
| double normalizedCost = cost * usageRatio; | ||
| final double usageRatio = (double) server.getSizeUsed() / maxSize; | ||
| final double headroom = Math.max(EPSILON, 1.0 - usageRatio); | ||
| double normalizedCost = cost / headroom; |
There was a problem hiding this comment.
[P1] Existing threshold test now fails
Changing normalization to cost / headroom makes the existing testThresholdBlocksMarginalMove scenario choose DEST: source is roughly 38K / 0.20 * 0.95 = 180.5K, while dest is 40K / 0.26 = 153.8K. The test still asserts null, so the server test suite should fail unless the threshold scenario or algorithm is adjusted.
| * A {@link BalancerStrategy} which normalizes the cost of placing a segment on a | ||
| * server as calculated by {@link CostBalancerStrategy} by multiplying it by the | ||
| * server's disk usage ratio. | ||
| * server as calculated by {@link CostBalancerStrategy} by dividing by the |
There was a problem hiding this comment.
[P2] Public docs still describe the old formula
The implementation and Javadoc now divide by available headroom, but docs/design/coordinator.md and docs/configuration/index.md still say diskNormalized multiplies cost by diskUsed / maxSize. That leaves user-facing behavior documentation incorrect for this config option.
gianm
left a comment
There was a problem hiding this comment.
Have you tried this out in a real cluster? I wonder if we should have a property to restore the old behavior in case something undesirable happens.
I also struggle with the naming. The term "normalized" makes me think that costs will be scaled somehow into some bounded range. The adjusted cost function in the PR is doing something different from normalization, it's blowing up the cost as servers fill up.
I wonder if this could all be cleaner by having the cost adjustments be implemented as options to the cost balancer rather than entirely distinct balancer strategies. What are your thoughts on this?
Btw, the docs would need to be updated with this formula change, since the "multiplies the cost of placing a segment on a server by the server's disk usage ratio (diskUsed / maxSize)" is no longer what happens.
Description
The existing linear penalization factor is still ineffective in large skew scenarios where the CostBalancerStrategy's cost forces a move/load (even with the utilization-based penalty). This switches the penalty to scale exponentially with the disk utilization, ensuring that near-full historicals are penalized.
Release note
Adjust diskNormalized strategy to scale cost exponentially with disk utilization
This PR has: