You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:
Cluster Autoscaler was extended significantly for DRA and there are parts that cluster-api also needs to address as Cloud Provider.
Cluster Autoscaler has a way to scale-up that called "scale-from-0-nodes" scenario where there are no existing Nodes in NodeGroup and then a new Node spawns from there.
In this case, each Cloud Provider is responsible for providing a template node (NodeInfo) by TemplateNodeInfo() and it has the resource information of the node like CPU, memory and GPU.
At the age of Device Plugin, cluster-api provided the node template by using Annotation added to NodeGroup such as MachineSet and MachineDeployment.[1]
However, for DRA, cluster-api has not yet implemented the logic to create the template of ResourceSlice from this information at this point.
For example, when users want to spawn the node with GPU as DRA resources in NodeGroup where there is no existing node. Although it's for a pending pod that requires GPU with ResourceClaim, they could have no option to execute it and it is likely not to work.
Therefore, cluster-api needs the feature for users to specify devices in ResourceSlice of the node to be spawned in scale-from-0-nodes scenario.
Which component are you using?:
/area cluster-autoscaler
Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:
Cluster Autoscaler was extended significantly for DRA and there are parts that cluster-api also needs to address as Cloud Provider.
Cluster Autoscaler has a way to scale-up that called "scale-from-0-nodes" scenario where there are no existing Nodes in NodeGroup and then a new Node spawns from there.
In this case, each Cloud Provider is responsible for providing a template node (NodeInfo) by TemplateNodeInfo() and it has the resource information of the node like CPU, memory and GPU.
At the age of Device Plugin, cluster-api provided the node template by using Annotation added to NodeGroup such as MachineSet and MachineDeployment.[1]
However, for DRA, cluster-api has not yet implemented the logic to create the template of ResourceSlice from this information at this point.
For example, when users want to spawn the node with GPU as DRA resources in NodeGroup where there is no existing node. Although it's for a pending pod that requires GPU with ResourceClaim, they could have no option to execute it and it is likely not to work.
Therefore, cluster-api needs the feature for users to specify devices in ResourceSlice of the node to be spawned in scale-from-0-nodes scenario.
[1] https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/clusterapi/README.md#scale-from-zero-support
Describe the solution you'd like.:
The simplest idea is adding the more annotations into the NodeGroup to be the basis of ResourceSlice like the following.
The text was updated successfully, but these errors were encountered: