[lownodeutilization]: Actual utilization: integration with Prometheus #1533

ingvagabund · 2024-10-11T15:20:33Z

Extend the actual utilization awareness with Prometheus integration.

For testing purposes:

    apiVersion: "descheduler/v1alpha2"
    kind: "DeschedulerPolicy"
    metricsProviders:
    - source: Prometheus
      prometheus:
        url: http://prometheus-kube-prometheus-prometheus.prom.svc.cluster.local
        authToken:
          secretReference:
            namespace: "kube-system"
            name: "authtoken"
    profiles:
      - name: ProfileName
        pluginConfig:
        - name: "LowNodeUtilization"
          args:
            thresholds:
              "MetricResource": 20
            targetThresholds:
              "MetricResource": 70
            metricsUtilization:
              source: Prometheus
              prometheus:
                query: instance:node_cpu:rate:sum
        plugins:
          balance:
            enabled:
              - "LowNodeUtilization"

TODO:

extend the unit tests with a metrics+deviation test
allow to evict multiple pods without resource counting when prometheus metrics are enabled
pull UPSTREAM: <carry>: create public function to run prometheus query openshift/descheduler#153 into the code base

fanhaouu · 2024-10-12T02:59:04Z

Hello, master. Due to the company's busy schedule previously, I only managed to complete half of the related KEP. I'm glad to see that you're working on this. It looks like you're aiming to reuse the current Node utilization logic. I have a few suggestions:

It should support different data sources, similar to PayPal's load-watcher.
It should support various real-time data processing algorithms. For instance, real-time calculations, using rate averages, or predictions based on EWMA + P95, similar to the approach used by autoscaler.
If the goal is to address real-time CPU hotspots, perhaps there’s no need to calculate the number of nodes below or above a certain threshold. Of course, you could also provide a switch to control this behavior.

Hope these suggestions help!

ingvagabund · 2024-10-14T11:47:53Z

Hello sir :)

thank you for taking part in composing the out-of-tree descheduling plugin KEP.

It should support different data sources, similar to PayPal's load-watcher.

You are on the right track here. I'd like to get in touch with load-watcher maintainers and extend the codebase to provide a generic interface for accessing metrics related to pod utilization as well. Currently, only actual node utilization gets collected. Meantime, I am forming the code here to be able to better integrate with other utilization sources like metrics.

It should support various real-time data processing algorithms. For instance, real-time calculations, using rate averages, or predictions based on EWMA + P95, similar to the approach used by autoscaler.

This is where we can debate more. Thank you for sharing the specifics. There's an open issue for the pod autoscaler suggesting to introduce EMA: kubernetes/kubernetes#62235. Are you aware if there's a similar issue or a discussion for the cluster autoscaler? I'd love to learn more about how it's implemented there. Ultimately, the current plugin just needs to know which pod, when evicted, will improve the overall node/workload utilization when properly re-scheduled. I could see various ways to produce the utilization snapshot using various methods.

If the goal is to address real-time CPU hotspots, perhaps there’s no need to calculate the number of nodes below or above a certain threshold. Of course, you could also provide a switch to control this behavior.

I can see how evicting hotspot pods is related to consuming the metrics/real-time node utilization. In the current plugin context this is more suitable for a new/different plugin. I can also see how RemoveDuplicates can be extended to evict based on overall node utilization instead of the current counting approach. Not every plugin will need to consume metrics. Though, there can be common pieces shared across them through the descheduling framework.

ingvagabund · 2024-11-07T13:27:32Z

kubernetes/kubernetes#128663 to address the discrepancy in the fake metrics client node/pod metricses resource name.

ingvagabund · 2024-11-08T16:50:29Z

/test pull-descheduler-verify-master

ingvagabund · 2024-11-15T15:10:25Z

Integration with kubernetes metrics in #1555.

googs1025 · 2025-03-12T05:24:58Z

/cc

atiratree · 2025-03-12T12:44:55Z

pkg/descheduler/descheduler.go

-	}, nil
+	if namespacedSharedInformerFactory != nil && deschedulerPolicy.Prometheus != nil {
+		namespacedSharedInformerFactory.Core().V1().Secrets().Informer().AddEventHandler(desch.eventHandler())
+		desch.namespacedSecretsLister = namespacedSharedInformerFactory.Core().V1().Secrets().Lister().Secrets(deschedulerPolicy.Prometheus.AuthToken.SecretReference.Namespace)


nil check for AuthToken?

atiratree · 2025-03-12T12:53:46Z

pkg/descheduler/descheduler.go

@@ -462,7 +604,19 @@ func RunDeschedulerStrategies(ctx context.Context, rs *options.DeschedulerServer
 		}
 	}

+	if namespacedSharedInformerFactory != nil {


Can we create an extra variable similar to reconcileInClusterSAToken to condition this? At this point it is not entirerly clean why it depends on namespacedSharedInformerFactory

Actually, sa token and secret reconciller are mutually exclusive. Can we use iota enum here?

atiratree · 2025-03-12T15:51:33Z

README.md

+metrics outside of the kubernetes metrics server. The query is expected to return a vector of values for
+each node. The values are expected to be any real number within <0; 1> interval. During eviction only
+a single pod is evicted at most from each overutilized node. There's currently no support for evicting
+more. Kubernetes metric server takes precedence over Prometheus.


+1, we can update the text above now

pkg/framework/plugins/nodeutilization/types.go

atiratree · 2025-03-12T17:05:22Z

pkg/framework/plugins/nodeutilization/usageclients.go

+	client._nodeUtilization = make(map[string]map[v1.ResourceName]*resource.Quantity)
+	client._pods = make(map[string][]*v1.Pod)
+
+	results, warnings, err := promv1.NewAPI(client.promClient).Query(context.TODO(), client.promQuery, time.Now())


context passing could be still improved

pkg/framework/plugins/nodeutilization/nodeutilization.go

README.md

pkg/api/types.go

pkg/descheduler/descheduler.go

pkg/descheduler/policyconfig.go

pkg/framework/plugins/nodeutilization/lownodeutilization.go

pkg/descheduler/descheduler.go

README.md

pkg/descheduler/descheduler.go

atiratree · 2025-03-17T14:38:18Z

pkg/api/types.go

+	URL string
+	// authToken used for authentication with the prometheus server.
+	// If not set the in cluster authentication token for the descheduler service
+	// account is read from the container's file system is read.


Suggested change

// account is read from the container's file system is read.

// account is read from the container's file system.

atiratree · 2025-03-17T14:38:52Z

pkg/api/v1alpha2/types.go

+
+type Prometheus struct {
+	URL string `json:"url,omitempty"`
+	// If not set the in cluster authentication token from the container's file system is read.


needs update as well

pkg/descheduler/policyconfig.go

pkg/descheduler/descheduler.go

atiratree · 2025-03-17T15:08:00Z

pkg/descheduler/descheduler.go

+			if d.previousPrometheusClientTransport != nil {
+				d.previousPrometheusClientTransport.CloseIdleConnections()
+			}
+			d.previousPrometheusClientTransport = nil


#1533 (comment)

atiratree · 2025-03-17T15:24:17Z

LGTM

ingvagabund · 2025-03-17T15:25:06Z

@atiratree thank you for your patience and expertise. Making the code much more better.

Squashing the comments before the final merge.

k8s-ci-robot · 2025-03-17T15:26:24Z

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot requested review from JaneLiuL and jklaw90 October 11, 2024 15:20

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 11, 2024

ingvagabund force-pushed the node-utilization-util-snapshot branch from c889a53 to 1f55c4d Compare October 15, 2024 10:18

k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 5, 2024

ingvagabund force-pushed the node-utilization-util-snapshot branch 3 times, most recently from d744a96 to 800c92c Compare November 6, 2024 18:34

ingvagabund force-pushed the node-utilization-util-snapshot branch from f30f8a1 to 2e63411 Compare November 7, 2024 15:40

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 7, 2024

ingvagabund force-pushed the node-utilization-util-snapshot branch 4 times, most recently from 0330902 to baa6650 Compare November 8, 2024 15:52

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 13, 2024

ingvagabund force-pushed the node-utilization-util-snapshot branch from baa6650 to 2442967 Compare November 16, 2024 09:09

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 16, 2024

ingvagabund force-pushed the node-utilization-util-snapshot branch 5 times, most recently from 477104c to e6e5bf9 Compare November 16, 2024 19:04

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 11, 2024

ingvagabund mentioned this pull request Mar 7, 2025

WIP: RelieveAndMigrate: new operator profile for PSI pressure based load-aware descheduling openshift/cluster-kube-descheduler-operator#460

Open

ingvagabund force-pushed the node-utilization-util-snapshot branch from d143aad to 0f0c525 Compare March 11, 2025 14:05

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 11, 2025

k8s-ci-robot requested a review from googs1025 March 12, 2025 05:25

atiratree reviewed Mar 12, 2025

View reviewed changes

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 13, 2025

ingvagabund mentioned this pull request Mar 13, 2025

[nodeutilization]: allow to set a metrics source as a string so it can be later extended for exclusive configuration #1614

Merged

Update vendor for prometheus deps

aed3459

ingvagabund force-pushed the node-utilization-util-snapshot branch from 0f0c525 to 4a9a008 Compare March 14, 2025 14:21

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 14, 2025

ingvagabund force-pushed the node-utilization-util-snapshot branch 3 times, most recently from 8085495 to d7421d7 Compare March 14, 2025 15:14

atiratree reviewed Mar 14, 2025

View reviewed changes

ingvagabund force-pushed the node-utilization-util-snapshot branch from 1832705 to 893bda5 Compare March 15, 2025 12:14

atiratree reviewed Mar 17, 2025

View reviewed changes

ingvagabund force-pushed the node-utilization-util-snapshot branch from 893bda5 to d365253 Compare March 17, 2025 15:21

[nodeutilization]: prometheus usage client with prometheus metrics

e283c31

ingvagabund force-pushed the node-utilization-util-snapshot branch from d365253 to e283c31 Compare March 17, 2025 15:26

ingvagabund added lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Mar 17, 2025

k8s-ci-robot merged commit 6ab73d6 into kubernetes-sigs:master Mar 17, 2025
9 checks passed

ingvagabund deleted the node-utilization-util-snapshot branch March 17, 2025 15:53

freedomchurl mentioned this pull request Mar 17, 2025

Is there a plan to support Prometheus in the actual metric spec of LowNodeUtilization? #1646

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[lownodeutilization]: Actual utilization: integration with Prometheus #1533

[lownodeutilization]: Actual utilization: integration with Prometheus #1533

ingvagabund commented Oct 11, 2024 •

edited

Loading

fanhaouu commented Oct 12, 2024

ingvagabund commented Oct 14, 2024

ingvagabund commented Nov 7, 2024

ingvagabund commented Nov 8, 2024

ingvagabund commented Nov 15, 2024

googs1025 commented Mar 12, 2025

atiratree Mar 12, 2025

ingvagabund Mar 14, 2025

atiratree Mar 12, 2025

atiratree Mar 12, 2025

ingvagabund Mar 14, 2025

atiratree Mar 12, 2025

atiratree Mar 12, 2025

atiratree Mar 17, 2025

atiratree Mar 17, 2025

atiratree Mar 17, 2025

atiratree commented Mar 17, 2025

ingvagabund commented Mar 17, 2025

k8s-ci-robot commented Mar 17, 2025

	// account is read from the container's file system is read.
	// account is read from the container's file system.

[lownodeutilization]: Actual utilization: integration with Prometheus #1533

[lownodeutilization]: Actual utilization: integration with Prometheus #1533

Conversation

ingvagabund commented Oct 11, 2024 • edited Loading

fanhaouu commented Oct 12, 2024

ingvagabund commented Oct 14, 2024

ingvagabund commented Nov 7, 2024

ingvagabund commented Nov 8, 2024

ingvagabund commented Nov 15, 2024

googs1025 commented Mar 12, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

atiratree commented Mar 17, 2025

ingvagabund commented Mar 17, 2025

k8s-ci-robot commented Mar 17, 2025

ingvagabund commented Oct 11, 2024 •

edited

Loading