Skip to content

Conversation

@bumarcell
Copy link

Summary

Add opt-in OSD utilization (fullness) consideration to the undo-upmaps command via the --prioritize-fullness flag. When enabled, the tool selects PGs to undo based on a composite score combining OSD fullness and backfill load, with fullness as the primary factor.

Motivation

After swap-bucket operations or in clusters with uneven disk utilization, it's often desirable to prioritize data movement based on OSD fullness rather than just backfill load. This allows operators to preferentially drain fuller OSDs or fill emptier ones, helping to rebalance storage utilization more effectively.

Key Features

  • OSD utilization tracking: Fetches and caches OSD df data to determine utilization percentage
  • Composite scoring: score = (backfillWeight * backfillScore) + (fullnessWeight * fullnessScore)
  • Configurable weights: Default weights (backfillWeight=1, fullnessWeight=10) make a 1% fullness difference equivalent to ~1 backfill reservation slot
  • Context-aware selection:
    • With --target flag: prioritizes removing PGs from fuller OSDs
    • Without --target: prioritizes moving PGs to emptier OSDs
  • Graceful fallback: Falls back to backfill-only scoring if OSD df data is unavailable
  • Pool compatibility: Works with both replicated and EC pools

Implementation Details

  • Added osdBackfillState.utilization field to track OSD fullness
  • Added osdDf() function in ceph.go with caching support
  • Modified remapLeastBusyPg() to accept scoreSource and preferFuller parameters for flexible scoring
  • Added comprehensive test coverage:
    • Basic fullness scoring functionality
    • Fullness disabled (baseline behavior preserved)
    • Tiebreaker logic when OSDs have equal fullness
    • Graceful fallback when OSD df data unavailable
    • Both target and source modes

Example Usage

After a swap-bucket operation, prioritize draining the fuller OSDs:

./pgremapper undo-upmaps bucket:old-bucket --prioritize-fullness \
  --max-backfill-reservations 2 --max-source-backfills 2 --yes

With --target flag, bring data back from the fullest OSDs:

./pgremapper undo-upmaps bucket:new-bucket --target --prioritize-fullness \
  --max-backfill-reservations 2 --max-source-backfills 2 --yes

Testing

All existing tests pass, and new tests cover:

  • ✅ Fullness scoring enabled vs disabled
  • ✅ Equal fullness tiebreaker with backfill load
  • ✅ Missing OSD df data fallback
  • ✅ Both target and source modes
  • ✅ Score inversion for "prefer fuller" mode
$ go test -v
PASS
ok  	github.com/digitalocean/pgremapper	0.360s

Documentation

  • Updated README.md with flag documentation and examples
  • Added explanation of composite scoring formula
  • Documented fallback behavior

Co-Authored-By: Claude Sonnet 4.5 noreply@anthropic.com

Jamal Allogie and others added 2 commits January 26, 2026 10:34
Add opt-in OSD utilization (fullness) consideration to the undo-upmaps
command via the --prioritize-fullness flag. When enabled, the tool selects
PGs to undo based on a composite score combining OSD fullness and backfill
load, with fullness as the primary factor.

Key features:
- Fetches OSD df data to determine utilization percentage
- Composite scoring: score = (backfillWeight * backfillScore) + (fullnessWeight * fullnessScore)
- Default weights (backfillWeight=1, fullnessWeight=10) make a 1% fullness
  difference equivalent to ~1 backfill reservation slot
- With --target flag: prioritizes removing PGs from fuller OSDs
- Without --target: prioritizes moving PGs to emptier OSDs
- Gracefully falls back to backfill-only scoring if OSD df data unavailable
- Works with both replicated and EC pools

This is especially useful after swap-bucket operations or when rebalancing
clusters with uneven disk utilization.

Implementation details:
- Added osdBackfillState.utilization field to track OSD fullness
- Added osdDf() function in ceph.go with caching
- Modified remapLeastBusyPg() to accept scoreSource and preferFuller parameters
- Added comprehensive tests for fullness scoring, tiebreaker logic, and fallback

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add test coverage for OSD fullness prioritization:
- TestRemapLeastBusyPgWithFullness: Basic fullness scoring
- TestRemapLeastBusyPgWithoutFullness: Baseline behavior without flag
- TestFullnessTiebreaker: Backfill load as tiebreaker when fullness equal
- TestMissingOsdDfData: Graceful fallback when OSD df unavailable
- Updated TestCalcPgMappingsToUndoUpmaps expectations for fixed scoring

All tests verify correct behavior for both target and source modes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant