Skip to content

WIP: RelieveAndMigrate: new operator profile for PSI pressure based load-aware descheduling #460

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,7 @@ The following profiles are currently provided:
* [`LifecycleAndUtilization`](#LifecycleAndUtilization)
* [`LongLifecycle`](#LongLifecycle)
* [`CompactAndScale`](#compactandscale-techpreview)
* [`RelieveAndMigrate`](#relieveandmigrate-techpreview)
* [`EvictPodsWithPVC`](#EvictPodsWithPVC)
* [`EvictPodsWithLocalStorage`](#EvictPodsWithLocalStorage)

Expand Down Expand Up @@ -151,6 +152,25 @@ An under utilized node is any node consuming less than 20% of its available cpu,
This profile enables the [`HighNodeUtilization`](https://github.com/kubernetes-sigs/descheduler/#highnodeutilization) strategy.
In the future, more configuration may be made available through the operator based on user feedback.

### RelieveAndMigrate

This profiles seeks to evict pods from high-cost nodes to relieve overall expenses while considering workload migration.
Node cost can include:
- Actual resource utilization: Increased resource pressure leads to higher overhead for running applications.
- Node maintenance costs: A higher number of containers on a node results in greater resource counting.
Migration strategies may involve VM live migration, state transitions between stateful set pods, and other methods.

This profile enables the [`LowNodeUtilization`](https://github.com/kubernetes-sigs/descheduler/#lownodeutilization) strategy
with `EvictionsInBackground` alpha feature enabled.
In the future, more configuration may be made available through the operator based on user feedback.

The profile exposes the following customization:
- `devLowNodeUtilizationThresholds`: Sets experimental thresholds for the LowNodeUtilization strategy.
- `devActualUtilizationProfile`: Enable load-aware descheduling.
- `devDeviationThresholds`: Have the thresholds be based on the average utilization.
- `devMultiSoftTainting`: For applying multiple soft-taints instead of a single one.
- `devMultiEvictions`: Evict multiple pods per node during a descheduling cycle.

### EvictPodsWithPVC
By default, the operator prevents pods with PVCs from being evicted. Enabling this
profile in combination with any of the above profiles allows pods with PVCs to be
Expand All @@ -177,6 +197,9 @@ the `profileCustomizations` field:
|`devEnableEvictionsInBackground`|`bool`| Enables descheduler's EvictionsInBackground alpha feature. The EvictionsInBackground alpha feature is a subject to change. Currently provided as an experimental feature.|
| `devHighNodeUtilizationThresholds` | `string` | Sets thresholds for the [HighNodeUtilization](https://github.com/kubernetes-sigs/descheduler#highnodeutilization) strategy of the `CompactAndScale` profile in the following ratios: `Minimal` for 10%, `Modest` for 20%, `Moderate` for 30%. Currently provided as an experimental feature.|
|`devActualUtilizationProfile`|`string`| Sets a profile that gets translated into a predefined prometheus query |
| `devDeviationThresholds` | `string` | Have the thresholds be based on the average utilization. Thresholds signify the distance from the average node utilization in the following setting: `Low`: 10%:10%, `Medium`: 20%:20%, `High`: 30%:30% |
| `devMultiSoftTainting` | `string` | To apply multiple soft-taints instead of just one: `Static` for a single taint, `Dynamic` for one or more, depending on the remaining utilization. |
| `devMultiEvictions` | `string` | Evict multiple pods per node during a descheduling cycle: `Simple` for 1 (default), `Modest` for 2, `Rapid` for `5`. |

## Prometheus query profiles
The operator provides the following profiles:
Expand Down
57 changes: 57 additions & 0 deletions pkg/apis/descheduler/v1/types_descheduler.go
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,18 @@ type ProfileCustomizations struct {
// LowNodeUtilization plugin can consume the metrics for now.
// Currently provided as an experimental feature.
DevActualUtilizationProfile ActualUtilizationProfile `json:"devActualUtilizationProfile,omitempty"`

// devDeviationThresholds enables dynamic thresholds based on average resource utilization
// +kubebuilder:validation:Enum=Low;Medium;High;""
DevDeviationThresholds *DeviationThresholdsType `json:"devDeviationThresholds,omitempty"`

// To apply multiple soft-taints instead of just one
// +kubebuilder:validation:Enum=Static;Dynamic;""
DevMultiSoftTainting *MultiSoftTaintingType `json:"devMultiSoftTainting,omitempty"`

// Evict multiple pods per node during a descheduling cycle
// +kubebuilder:validation:Enum=Simple;Modest;Rapid;""
DevMultiEvictions *MultiEvictionsType `json:"devMultiEvictions,omitempty"`
}

type LowNodeUtilizationThresholdsType string
Expand Down Expand Up @@ -121,6 +133,48 @@ var (
CompactModerateThreshold HighNodeUtilizationThresholdsType = "Moderate"
)

type DeviationThresholdsType string

var (
// DeviationThresholdLow sets thresholds to 10%:10% ratio.
// The threshold value is subject to change.
DeviationThresholdLow DeviationThresholdsType = "Low"

// DeviationThresholdMedium sets thresholds to 20%:20% ratio.
// The threshold value is subject to change.
DeviationThresholdMedium DeviationThresholdsType = "Medium"

// DeviationThresholdHigh sets thresholds to 30%:30% ratio.
// The threshold value is subject to change.
DeviationThresholdHigh DeviationThresholdsType = "High"
)

type MultiSoftTaintingType string

var (
// MultiSoftTaintingStatic taints a node with a single soft taint
MultiSoftTaintingStatic MultiSoftTaintingType = "Static"

// MultiSoftTaintingDynamic taints a node with one or more soft taints, depending on the remaining utilization.
MultiSoftTaintingDynamic MultiSoftTaintingType = "Dynamic"
)

type MultiEvictionsType string

var (
// MultiEvictionsSimple evicts a single pod during a descheduling cycle
// The value is subject to change.
MultiEvictionsSimple MultiEvictionsType = "Simple"

// MultiEvictionsSimple evicts two pods during a descheduling cycle
// The value is subject to change.
MultiEvictionsModest MultiEvictionsType = "Modest"

// MultiEvictionsSimple evicts five pods during a descheduling cycle
// The value is subject to change.
MultiEvictionsRapid MultiEvictionsType = "Rapid"
)

// ActualUtilizationProfile sets predefined Prometheus PromQL query
type ActualUtilizationProfile string

Expand Down Expand Up @@ -178,6 +232,9 @@ var (

// CompactAndScale seeks to evict pods to enable the same workload to run on a smaller set of nodes.
CompactAndScale DeschedulerProfile = "CompactAndScale"

// RelieveAndMigrate seeks to evict pods from high-cost nodes to relieve overall expenses while considering workload migration.
RelieveAndMigrate DeschedulerProfile = "RelieveAndMigrate"
)

// DeschedulerProfile allows configuring the enabled strategy profiles for the descheduler
Expand Down
15 changes: 15 additions & 0 deletions pkg/apis/descheduler/v1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

68 changes: 34 additions & 34 deletions pkg/generated/applyconfiguration/descheduler/v1/kubedescheduler.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading