-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed to watch *v1.VolumeAttachment #7663
Comments
Same issue with chart 9.45.0, image 1.32.0 |
Same problem with DigitalOcean / chart 9.45.0 / image 1.32.0. Adding the necessary permissions for Run: kubectl edit clusterrole <autoscaler-cluster-role-name> Then update the YAML and add apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
[...]
rules:
[...]
- apiGroups:
- storage.k8s.io
resources:
- storageclasses
- csinodes
- csidrivers
- csistoragecapacities
- volumeattachments # <== This
verbs:
- watch
- list
- get
[...] |
Seeing the same in our eks cluster missing |
Same for me, with a fresh installed EKS 1.31 |
Same issue here |
Latest chart 9.45.1 on EKS 1.31 has the same issue |
Same issue :/ |
I am also seeing this issue in the most current Helm chart as of today, |
I fixed this by downgrading to 9.44.0 based on this table: https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler#releases |
Which component are you using?: cluster-autoscaler on AWS
/area cluster-autoscaler
What version of the component are you using?: 9.45
Component version: Helm chart 9.45
What k8s version are you using (
kubectl version
)?:kubectl version
OutputWhat environment is this in?: AWS EKS
What did you expect to happen?: I am trying to figure out why the autoscaler does not honor my
--ok-total-unready-count=0
. It seems the node that enters theNotReady
state is stuck with many terminating pods, and I observed at the same time the error in the autoscaler log.The error is the following:
When looking at the clusterrole created by the helm chart, I am not seeing this particular resource:
I am not sure, but given the
--ok-total-unready-count=0
, I would expect the node which enters theNotReady
state to be fairly quickly replaced by a node that can handle things.What happened instead?:
The
NotReady
node sticks around for quite some time, with bunch of pods inTerminating
state. Eventually, it'll go away after some time (maybe 30-45mn).How to reproduce it (as minimally and precisely as possible):
Something is causing my node to get to
NotReady
state, I think way too much over-committment on them, especially on memory (then the kubelet then bails out).I am afraid I can't :-/
Anything else we need to know?:
An log iteration where I see the volumeattachment error:
The text was updated successfully, but these errors were encountered: