Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restricting VPA recommender scans to specific namespaces #7697

Open
ncmuthu opened this issue Jan 15, 2025 · 12 comments · May be fixed by #7716
Open

Restricting VPA recommender scans to specific namespaces #7697

ncmuthu opened this issue Jan 15, 2025 · 12 comments · May be fixed by #7716
Assignees
Labels
area/vertical-pod-autoscaler kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@ncmuthu
Copy link

ncmuthu commented Jan 15, 2025

Which component are you using?:
/area vertical-pod-autoscaler

What version of the component are you using?:
1.0.0

Component version:

What k8s version are you using (kubectl version)?:

kubectl version Output
$ kubectl version
$ kubectl version
Client Version: v1.32.0
Kustomize Version: v5.5.0
Server Version: v1.32.0

What environment is this in?:
AWS EKS and local Kind cluster

What did you expect to happen?:
I am using the flag --vpa-base-namespace=vpa to limit the VPA functionality only to vpa namespace. It is detecting the VPA resources only from the specified vpa namespace, but the vpa recommender scans the verticalpodautoscalercheckpoints of all namespaces every 10minutes instead of scanning only the specified namespace. We have around 3000 namespaces, so scanning all the namespaces every 10minutes adds load to the kube-api server.

What happened instead?:
Vpa recommender scans the verticalpodautoscalercheckpoints of all namespaces every 10minutes instead of scanning only the specified namespace. Would like to avoid scanning all the namespaces

How to reproduce it (as minimally and precisely as possible):

  • Install the VPA with default parameters and add --vpa-base-namespace=vpa
  • Create 3000+ empty namespaces in the cluster
  • After 13 minutes, will be able to see the logs similar to below.

Anything else we need to know?:
Logs:

I0115 14:54:19.448086       1 flags.go:57] FLAG: --add-dir-header="false"
I0115 14:54:19.448217       1 flags.go:57] FLAG: --address=":8942"
I0115 14:54:19.448218       1 flags.go:57] FLAG: --alsologtostderr="false"
I0115 14:54:19.448219       1 flags.go:57] FLAG: --checkpoints-gc-interval="10m0s"
I0115 14:54:19.448220       1 flags.go:57] FLAG: --checkpoints-timeout="1m0s"
I0115 14:54:19.448221       1 flags.go:57] FLAG: --container-name-label="name"
I0115 14:54:19.448222       1 flags.go:57] FLAG: --container-namespace-label="namespace"
I0115 14:54:19.448224       1 flags.go:57] FLAG: --container-pod-name-label="pod_name"
I0115 14:54:19.448225       1 flags.go:57] FLAG: --cpu-histogram-decay-half-life="24h0m0s"
I0115 14:54:19.448226       1 flags.go:57] FLAG: --cpu-integer-post-processor-enabled="false"
I0115 14:54:19.448227       1 flags.go:57] FLAG: --external-metrics-cpu-metric=""
I0115 14:54:19.448228       1 flags.go:57] FLAG: --external-metrics-memory-metric=""
I0115 14:54:19.448229       1 flags.go:57] FLAG: --history-length="8d"
I0115 14:54:19.448230       1 flags.go:57] FLAG: --history-resolution="1h"
I0115 14:54:19.448231       1 flags.go:57] FLAG: --kube-api-burst="20"
I0115 14:54:19.448232       1 flags.go:57] FLAG: --kube-api-qps="5"
I0115 14:54:19.448238       1 flags.go:57] FLAG: --kubeconfig=""
I0115 14:54:19.448240       1 flags.go:57] FLAG: --log-backtrace-at=":0"
I0115 14:54:19.448242       1 flags.go:57] FLAG: --log-dir=""
I0115 14:54:19.448243       1 flags.go:57] FLAG: --log-file=""
I0115 14:54:19.448244       1 flags.go:57] FLAG: --log-file-max-size="1800"
I0115 14:54:19.448246       1 flags.go:57] FLAG: --logtostderr="true"
I0115 14:54:19.448251       1 flags.go:57] FLAG: --memory-aggregation-interval="24h0m0s"
I0115 14:54:19.448252       1 flags.go:57] FLAG: --memory-aggregation-interval-count="8"
I0115 14:54:19.448253       1 flags.go:57] FLAG: --memory-histogram-decay-half-life="24h0m0s"
I0115 14:54:19.448254       1 flags.go:57] FLAG: --memory-saver="false"
I0115 14:54:19.448256       1 flags.go:57] FLAG: --metric-for-pod-labels="up{job=\"kubernetes-pods\"}"
I0115 14:54:19.448257       1 flags.go:57] FLAG: --min-checkpoints="10"
I0115 14:54:19.448258       1 flags.go:57] FLAG: --one-output="false"
I0115 14:54:19.448260       1 flags.go:57] FLAG: --oom-bump-up-ratio="1.2"
I0115 14:54:19.448264       1 flags.go:57] FLAG: --oom-min-bump-up-bytes="1.048576e+08"
I0115 14:54:19.448265       1 flags.go:57] FLAG: --password=""
I0115 14:54:19.448267       1 flags.go:57] FLAG: --pod-label-prefix="pod_label_"
I0115 14:54:19.448275       1 flags.go:57] FLAG: --pod-name-label="kubernetes_pod_name"
I0115 14:54:19.448276       1 flags.go:57] FLAG: --pod-namespace-label="kubernetes_namespace"
I0115 14:54:19.448277       1 flags.go:57] FLAG: --pod-recommendation-min-cpu-millicores="15"
I0115 14:54:19.448279       1 flags.go:57] FLAG: --pod-recommendation-min-memory-mb="100"
I0115 14:54:19.448280       1 flags.go:57] FLAG: --prometheus-address=""
I0115 14:54:19.448281       1 flags.go:57] FLAG: --prometheus-cadvisor-job-name="kubernetes-cadvisor"
I0115 14:54:19.448282       1 flags.go:57] FLAG: --prometheus-query-timeout="5m"
I0115 14:54:19.448285       1 flags.go:57] FLAG: --recommendation-margin-fraction="0.15"
I0115 14:54:19.448292       1 flags.go:57] FLAG: --recommender-interval="1m0s"
I0115 14:54:19.448293       1 flags.go:57] FLAG: --recommender-name="default"
I0115 14:54:19.448294       1 flags.go:57] FLAG: --skip-headers="false"
I0115 14:54:19.448295       1 flags.go:57] FLAG: --skip-log-headers="false"
I0115 14:54:19.448296       1 flags.go:57] FLAG: --stderrthreshold="2"
I0115 14:54:19.448297       1 flags.go:57] FLAG: --storage=""
I0115 14:54:19.448298       1 flags.go:57] FLAG: --target-cpu-percentile="0.9"
I0115 14:54:19.448299       1 flags.go:57] FLAG: --use-external-metrics="false"
I0115 14:54:19.448300       1 flags.go:57] FLAG: --username=""
I0115 14:54:19.448302       1 flags.go:57] FLAG: --v="4"
I0115 14:54:19.448303       1 flags.go:57] FLAG: --vmodule=""
I0115 14:54:19.448304       1 flags.go:57] FLAG: --vpa-object-namespace="vpa"
I0115 14:54:19.448309       1 main.go:110] Vertical Pod Autoscaler 1.0.0 Recommender: default
I0115 14:54:19.448702       1 reflector.go:221] Starting reflector *v1.DaemonSet (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/input/controller_fetcher/controller_fetcher.go:136
I0115 14:54:19.448715       1 reflector.go:257] Listing and watching *v1.DaemonSet from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/input/controller_fetcher/controller_fetcher.go:136
I0115 14:54:19.549978       1 shared_informer.go:303] caches populated
I0115 14:54:19.550002       1 controller_fetcher.go:141] Initial sync of DaemonSet completed
I0115 14:54:19.550102       1 reflector.go:221] Starting reflector *v1.Deployment (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/input/controller_fetcher/controller_fetcher.go:136
I0115 14:54:19.550111       1 reflector.go:257] Listing and watching *v1.Deployment from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/input/controller_fetcher/controller_fetcher.go:136
I0115 14:54:19.651238       1 shared_informer.go:303] caches populated
I0115 14:54:19.651281       1 controller_fetcher.go:141] Initial sync of Deployment completed
I0115 14:54:19.651474       1 reflector.go:221] Starting reflector *v1.ReplicaSet (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/input/controller_fetcher/controller_fetcher.go:136
I0115 14:54:19.651489       1 reflector.go:257] Listing and watching *v1.ReplicaSet from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/input/controller_fetcher/controller_fetcher.go:136
I0115 14:54:19.753745       1 shared_informer.go:303] caches populated
I0115 14:54:19.753790       1 controller_fetcher.go:141] Initial sync of ReplicaSet completed
I0115 14:54:19.754054       1 reflector.go:221] Starting reflector *v1.StatefulSet (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/input/controller_fetcher/controller_fetcher.go:136
I0115 14:54:19.754082       1 reflector.go:257] Listing and watching *v1.StatefulSet from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/input/controller_fetcher/controller_fetcher.go:136
I0115 14:54:19.856309       1 shared_informer.go:303] caches populated
I0115 14:54:19.856382       1 controller_fetcher.go:141] Initial sync of StatefulSet completed
I0115 14:54:19.856767       1 reflector.go:221] Starting reflector *v1.ReplicationController (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/input/controller_fetcher/controller_fetcher.go:136
I0115 14:54:19.856799       1 reflector.go:257] Listing and watching *v1.ReplicationController from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/input/controller_fetcher/controller_fetcher.go:136
I0115 14:54:19.957446       1 shared_informer.go:303] caches populated
I0115 14:54:19.957476       1 controller_fetcher.go:141] Initial sync of ReplicationController completed
I0115 14:54:19.957626       1 reflector.go:221] Starting reflector *v1.Job (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/input/controller_fetcher/controller_fetcher.go:136
I0115 14:54:19.957638       1 reflector.go:257] Listing and watching *v1.Job from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/input/controller_fetcher/controller_fetcher.go:136
I0115 14:54:20.059084       1 shared_informer.go:303] caches populated
I0115 14:54:20.059119       1 controller_fetcher.go:141] Initial sync of Job completed
I0115 14:54:20.059381       1 reflector.go:221] Starting reflector *v1.CronJob (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/input/controller_fetcher/controller_fetcher.go:136
I0115 14:54:20.059402       1 reflector.go:257] Listing and watching *v1.CronJob from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/input/controller_fetcher/controller_fetcher.go:136
I0115 14:54:20.160830       1 shared_informer.go:303] caches populated
I0115 14:54:20.160883       1 controller_fetcher.go:141] Initial sync of CronJob completed
I0115 14:54:20.161137       1 main.go:148] Using Metrics Server.
I0115 14:54:20.161276       1 reflector.go:221] Starting reflector *v1.Pod (1h0m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/input/cluster_feeder.go:171
I0115 14:54:20.161295       1 reflector.go:257] Listing and watching *v1.Pod from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/input/cluster_feeder.go:171
I0115 14:54:20.161542       1 reflector.go:221] Starting reflector *v1.VerticalPodAutoscaler (1h0m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/utils/vpa/api.go:88
I0115 14:54:20.161556       1 reflector.go:257] Listing and watching *v1.VerticalPodAutoscaler from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/utils/vpa/api.go:88
I0115 14:54:20.262025       1 shared_informer.go:303] caches populated
I0115 14:54:20.262112       1 api.go:92] Initial VPA synced successfully
I0115 14:54:20.262410       1 shared_informer.go:303] caches populated
I0115 14:54:20.262456       1 fetcher.go:99] Initial sync of DaemonSet completed
I0115 14:54:20.262494       1 shared_informer.go:303] caches populated
I0115 14:54:20.262501       1 fetcher.go:99] Initial sync of Deployment completed
I0115 14:54:20.262509       1 shared_informer.go:303] caches populated
I0115 14:54:20.262516       1 fetcher.go:99] Initial sync of ReplicaSet completed
I0115 14:54:20.262532       1 shared_informer.go:303] caches populated
I0115 14:54:20.262558       1 fetcher.go:99] Initial sync of StatefulSet completed
I0115 14:54:20.262571       1 shared_informer.go:303] caches populated
I0115 14:54:20.262576       1 fetcher.go:99] Initial sync of ReplicationController completed
I0115 14:54:20.262583       1 shared_informer.go:303] caches populated
I0115 14:54:20.262588       1 fetcher.go:99] Initial sync of Job completed
I0115 14:54:20.262627       1 shared_informer.go:303] caches populated
I0115 14:54:20.262633       1 fetcher.go:99] Initial sync of CronJob completed
W0115 14:54:20.344780       1 shared_informer.go:419] The sharedIndexInformer has started, run more than once is not allowed
W0115 14:54:20.344812       1 shared_informer.go:419] The sharedIndexInformer has started, run more than once is not allowed
W0115 14:54:20.344841       1 shared_informer.go:419] The sharedIndexInformer has started, run more than once is not allowed
W0115 14:54:20.344845       1 shared_informer.go:419] The sharedIndexInformer has started, run more than once is not allowed
W0115 14:54:20.344856       1 shared_informer.go:419] The sharedIndexInformer has started, run more than once is not allowed
W0115 14:54:20.344853       1 shared_informer.go:419] The sharedIndexInformer has started, run more than once is not allowed
W0115 14:54:20.344857       1 shared_informer.go:419] The sharedIndexInformer has started, run more than once is not allowed
I0115 14:54:20.345077       1 recommender.go:210] New Recommender created &{clusterState:0x400001d0e0 clusterStateFeeder:0x4000161a40 checkpointWriter:0x4000310588 checkpointsGCInterval:600000000000 controllerFetcher:0x400068a2d0 lastCheckpointGC:{wall:13968495778911565158 ext:946342960 loc:0x23c8b00} vpaClient:0x4000417490 podResourceRecommender:0x40005121b0 useCheckpoints:true lastAggregateContainerStateGC:{wall:13968495778911564949 ext:946342794 loc:0x23c8b00} recommendationPostProcessor:[0x23fea40]}
I0115 14:54:20.345181       1 cluster_feeder.go:245] Initializing VPA from checkpoints
I0115 14:54:20.345214       1 cluster_feeder.go:317] Start selecting the vpaCRDs.
I0115 14:54:20.345229       1 cluster_feeder.go:352] Fetched 1 VPAs.
I0115 14:54:20.345311       1 cluster_feeder.go:362] Using selector app=nginx for VPA vpa/nginx-vpa
I0115 14:54:20.345362       1 cluster_feeder.go:254] Fetching checkpoints from namespace vpa
I0115 14:54:20.351032       1 cluster_feeder.go:261] Loading VPA vpa/nginx-vpa checkpoint for nginx

I0115 15:04:20.363110       1 recommender.go:155] Recommender Run
I0115 15:04:20.363224       1 cluster_feeder.go:317] Start selecting the vpaCRDs.
I0115 15:04:20.363240       1 cluster_feeder.go:352] Fetched 1 VPAs.
I0115 15:04:20.363347       1 cluster_feeder.go:362] Using selector app=nginx for VPA vpa/nginx-vpa
I0115 15:04:20.375177       1 metrics_client.go:74] 14 podMetrics retrieved for all namespaces
I0115 15:04:20.375355       1 cluster_feeder.go:440] ClusterSpec fed with #36 ContainerUsageSamples for #18 containers. Dropped #0 samples.
I0115 15:04:20.375387       1 recommender.go:165] ClusterState is tracking 14 PodStates and 1 VPAs
I0115 15:04:20.384038       1 checkpoint_writer.go:114] Saved VPA vpa/nginx-vpa checkpoint for nginx
I0115 15:04:20.384091       1 cluster_feeder.go:272] Starting garbage collection of checkpoints
I0115 15:04:20.384110       1 cluster_feeder.go:317] Start selecting the vpaCRDs.
I0115 15:04:20.384116       1 cluster_feeder.go:352] Fetched 1 VPAs.
I0115 15:04:20.384185       1 cluster_feeder.go:362] Using selector app=nginx for VPA vpa/nginx-vpa
I0115 15:04:22.362238       1 request.go:622] Waited for 192.79075ms due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/autoscaling.k8s.io/v1/namespaces/ns101/verticalpodautoscalercheckpoints
I0115 15:04:22.563352       1 request.go:622] Waited for 198.225876ms due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/autoscaling.k8s.io/v1/namespaces/ns1010/verticalpodautoscalercheckpoints
I0115 15:04:22.761011       1 request.go:622] Waited for 192.809625ms due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/autoscaling.k8s.io/v1/namespaces/ns1011/verticalpodautoscalercheckpoints
I0115 15:04:22.962065       1 request.go:622] Waited for 196.395709ms due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/autoscaling.k8s.io/v1/namespaces/ns1012/verticalpodautoscalercheckpoints
I0115 15:04:23.161045       1 request.go:622] Waited for 193.92025ms due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/autoscaling.k8s.io/v1/namespaces/ns1013/verticalpodautoscalercheckpoints
I0115 15:04:23.364266       1 request.go:622] Waited for 199.671167ms due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/autoscaling.k8s.io/v1/namespaces/ns1014/verticalpodautoscalercheckpoints
I0115 15:04:23.563172       1 request.go:622] Waited for 193.308292ms due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/autoscaling.k8s.io/v1/namespaces/ns1015/verticalpodautoscalercheckpoints
I0115 15:04:23.762318       1 request.go:622] Waited for 194.669916ms due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/autoscaling.k8s.io/v1/namespaces/ns1016/verticalpodautoscalercheckpoints
I0115 15:04:23.961083       1 request.go:622] Waited for 192.070292ms due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/autoscaling.k8s.io/v1/namespaces/ns1017/verticalpodautoscalercheckpoints
I0115 15:04:24.161377       1 request.go:622] Waited for 195.512209ms due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/autoscaling.k8s.io/v1/namespaces/ns1018/verticalpodautoscalercheckpoints
I0115 15:04:24.360894       1 request.go:622] Waited for 195.286375ms due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/autoscaling.k8s.io/v1/namespaces/ns1019/verticalpodautoscalercheckpoints
I0115 15:04:24.562020       1 request.go:622] Waited for 195.598125ms due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/autoscaling.k8s.io/v1/namespaces/ns102/verticalpodautoscalercheckpoints
I0115 15:04:24.763033       1 request.go:622] Waited for 195.887625ms due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/autoscaling.k8s.io/v1/namespaces/ns1020/verticalpodautoscalercheckpoints


We can avoid the client side throttling by increasing the kube-api-qps, but would like to avoid scanning all namespaces where we are not going to create VPA resources.

@ncmuthu ncmuthu added the kind/bug Categorizes issue or PR as related to a bug. label Jan 15, 2025
@adrianmoisey
Copy link
Member

Thanks for opening this issue.

Having the GarbageCollectCheckpoints() run on all namespaces seems to be an oversight, I think.

The purpose of the garbage collection is to obviously look for garbage, so it looks in all namespaces, in case someone used to have the VPA configured in a different namespace.

I assume we could add a flag to only look at the specified namespace. I can make a PR and see what others think.

/assign

@adrianmoisey
Copy link
Member

Hold on, I don't have capacity for this yet. I'll unsaying and let someone else take it

/unassign
/triage accepted

@k8s-ci-robot k8s-ci-robot added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jan 15, 2025
@omerap12
Copy link
Member

Thanks for opening this issue.

Having the GarbageCollectCheckpoints() run on all namespaces seems to be an oversight, I think.

The purpose of the garbage collection is to obviously look for garbage, so it looks in all namespaces, in case someone used to have the VPA configured in a different namespace.

I assume we could add a flag to only look at the specified namespace. I can make a PR and see what others think.

/assign

We already have two flags, VpaObjectNamespace and IgnoredVpaObjectNamespaces (https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/pkg/recommender/main.go#L129).
Wouldn't it be better to make use of these existing flags instead of introducing a new one? I prefer to keep the number of flags to a minimum.

@adrianmoisey
Copy link
Member

We already have two flags, VpaObjectNamespace and IgnoredVpaObjectNamespaces (https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/pkg/recommender/main.go#L129).
Wouldn't it be better to make use of these existing flags instead of introducing a new one? I prefer to keep the number of flags to a minimum.

Agreed, I like that option too. But it changes behaviour of those flags.
Ie, what if someone has the VPA setup to use multiple namespaces, and after some time changes it to a single namespace using VpaObjectNamespace.
If we re-used that flag, then there would be orphaned VPA checkpoint objects in other namespaces that never get cleaned up.

We just need to figure out a solution to handle this sort of thing (it could be just writing some documentation though)

@omerap12
Copy link
Member

Hmm, I didn’t think of that. Good point. Not sure what the best solution is here. Do you think just documenting this behavior would be enough?

@omerap12
Copy link
Member

/assign

@iamzili
Copy link

iamzili commented Jan 17, 2025

Another option could be to modify the interval (perhaps temporarily) for searching orphaned checkpoint objects across all namespaces by updating the --checkpoints-gc-interval flag. This would help avoid checking all namespaces every 10 minutes.

@ncmuthu what do you think?

@voelzmo
Copy link
Contributor

voelzmo commented Jan 17, 2025

@omerap12

Hmm, I didn’t think of that. Good point. Not sure what the best solution is here. Do you think just documenting this behavior would be enough?

Yeah, I think rather than introducing yet-another-flag for this scenario, it should be sufficient to document that making the range of namespaces that a recommender operates in smaller has the potential to leave things behind that users need to clean up themselves. This is true for the VPAs and for the VPACheckpoints as well.

@omerap12
Copy link
Member

@omerap12

Hmm, I didn’t think of that. Good point. Not sure what the best solution is here. Do you think just documenting this behavior would be enough?

Yeah, I think rather than introducing yet-another-flag for this scenario, it should be sufficient to document that making the range of namespaces that a recommender operates in smaller has the potential to leave things behind that users need to clean up themselves. This is true for the VPAs and for the VPACheckpoints as well.

Agreed. I will work on that :)

@voelzmo
Copy link
Contributor

voelzmo commented Jan 17, 2025

@iamzili

Another option could be to modify the interval (perhaps temporarily) for searching orphaned checkpoint objects across all namespaces by updating the --checkpoints-gc-interval flag. This would help avoid checking all namespaces every 10 minutes.

@ncmuthu what do you think?

another recommendation is to increase --kube-api-burst and --kube-api-qps when working in large-scale scenarios. The defaults are absolutely not suited for the scale that @ncmuthu mentioned in the original post.

@ncmuthu
Copy link
Author

ncmuthu commented Jan 17, 2025

Another option could be to modify the interval (perhaps temporarily) for searching orphaned checkpoint objects across all namespaces by updating the --checkpoints-gc-interval flag. This would help avoid checking all namespaces every 10 minutes.

@ncmuthu what do you think?

Thank you for the response and checking on this. This would be help. Even though I do not operate on other namespaces, I can increase to much higher value which will not affect the performance of the kube-api.

@ncmuthu
Copy link
Author

ncmuthu commented Jan 17, 2025

@iamzili

Another option could be to modify the interval (perhaps temporarily) for searching orphaned checkpoint objects across all namespaces by updating the --checkpoints-gc-interval flag. This would help avoid checking all namespaces every 10 minutes.
@ncmuthu what do you think?

another recommendation is to increase --kube-api-burst and --kube-api-qps when working in large-scale scenarios. The defaults are absolutely not suited for the scale that @ncmuthu mentioned in the original post.

Thank you for the response. At this time I mitigated with these settings, the only question is, since we want to operate on only one or few namespaces, looking for option to avoid queryng all the namespaces in regular interval.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/vertical-pod-autoscaler kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants