Skip to content

Commit 58d8371

Browse files
committed
fixup! review comments
1 parent ac6c681 commit 58d8371

File tree

2 files changed

+50
-8
lines changed
  • keps/sig-scheduling
    • 5027-dra-admin-controlled-device-attributes
    • 5055-dra-device-taints-and-tolerations

2 files changed

+50
-8
lines changed

keps/sig-scheduling/5027-dra-admin-controlled-device-attributes/README.md

+44-4
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,8 @@ SIG Architecture for cross-cutting KEPs).
9494
- [Implementation History](#implementation-history)
9595
- [Drawbacks](#drawbacks)
9696
- [Alternatives](#alternatives)
97+
- [Admin-intent in ResourceSlice](#admin-intent-in-resourceslice)
98+
- [Storing result of patching in ResourceSlice](#storing-result-of-patching-in-resourceslice)
9799
<!-- /toc -->
98100

99101
## Release Signoff Checklist
@@ -224,6 +226,16 @@ caching the patched devices and
224226
(re-)applying patches only when they or the device definitions change, which
225227
should be rare.
226228

229+
Patching directly in the informer event handlers may be fast enough. If it
230+
turns out to slow down those handlers too much, then a workqueue with workers
231+
may be needed to decouple updating the cache from the events which trigger
232+
updating and to avoid slowing down the informers.
233+
234+
The scheduler's "slice changed" cluster events must be driven by that cache,
235+
not the original informers, otherwise a ResourceSlice or ResourceSlicePatch
236+
change could trigger a pod scheduling attempt before the slice cache is
237+
up-to-date again.
238+
227239
## Design Details
228240

229241
### API
@@ -237,7 +249,7 @@ are feature-gated.
237249

238250
```Go
239251
type ResourceSlicePatch struct {
240-
metav1.TypeMeta
252+
metav1.TypeMeta
241253
// Standard object metadata
242254
// +optional
243255
metav1.ObjectMeta
@@ -278,7 +290,7 @@ type DevicePatch struct {
278290
// be marked as empty by setting their null field. Such entries remove the
279291
// corresponding attribute in a ResourceSlice, if there is one, instead of
280292
// overriding it. Because entries get removed and are not allowed in
281-
// slices, CEL expressions do not need need to deal with null values.
293+
// slices, CEL expressions do not need to deal with null values.
282294
//
283295
// The maximum number of attributes and capacities in the DevicePatch combined is 32.
284296
// This is an alpha field and requires enabling the DRAAdminControlledDeviceAttributes
@@ -288,6 +300,11 @@ type DevicePatch struct {
288300
// +featureGate:DRAAdminControlledDeviceAttributes
289301
Attributes map[FullyQualifiedName]NullableDeviceAttribute
290302

303+
// ^^^
304+
// The size limit is the same as for attributes and capacities in a ResourceSlice.
305+
// We could make it larger here because we are less constrained by overall object
306+
// size, but it seems unnecessary.
307+
291308
// Capacity defines the set of capacities to patch for matching devices.
292309
// The name of each capacity must be unique in that set and
293310
// include the domain prefix.
@@ -707,15 +724,15 @@ No.
707724
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
708725

709726
Pod scheduling should be as fast as would be without this feature, because in
710-
both cases it starts with listing all devices. That information is local can
727+
both cases it starts with listing all devices. That information is local and
711728
comes either from an informer cache or a cache of patched devices.
712729

713730
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
714731

715732
Filtering and patching are local operations, with no impact on the cluster. To
716733
prevent doing the same work repeatedly, it will be implemented so that it gets
717734
done once and then only processes changes. This increases CPU and RAM
718-
consumption. But even all devices should get patched (which is unlikely), memory
735+
consumption. But even if all devices should get patched (which is unlikely), memory
719736
will be shared between objects in the informer cache and in the patch cache, so
720737
it will not be doubled.
721738

@@ -766,7 +783,30 @@ harder for users to get a complete view.
766783

767784
## Alternatives
768785

786+
### Admin-intent in ResourceSlice
787+
769788
Instead of ResourceSlicePatch as a separate type, new fields in the
770789
ResourceSlice status could be modified by an admin. That has the problem that
771790
the ResourceSlice object might get deleted while doing cluster maintenance like
772791
a driver update, in which case the admin intent would get lost.
792+
793+
### Storing result of patching in ResourceSlice
794+
795+
A controller could read ResourceSlicePatches and apply them to
796+
ResourceSlices. Then consumers like the scheduler and users would only need to
797+
look at ResourceSlices. This has several drawbacks.
798+
799+
We would need to duplicate the attributes in the slice status. If we didn't and
800+
directly modified the spec, this patch controller and the CSI driver as the
801+
owner of the slice spec would fight against each other. Also, after removing a
802+
patch the original state must be available somewhere, otherwise the controller
803+
cannot restore it.
804+
805+
Duplicating the attributes might make a slice too large. The limits were chosen
806+
so that we have some space left for a status, but not enough for a status that
807+
is potentially as large as the spec.
808+
809+
Creating a single ResourceSlicePatch could force the controller to update a
810+
potentially large number of ResourceSlices. When using rate limiting, updating
811+
them all will take longer than client-side patching. When not using rate
812+
limiting, this could overwhelm the apiserver.

keps/sig-scheduling/5055-dra-device-taints-and-tolerations/README.md

+6-4
Original file line numberDiff line numberDiff line change
@@ -158,7 +158,7 @@ ResourceClaim.
158158

159159
### Goals
160160

161-
- Enable taking devices offline for maintenance while still allowing tests pods
161+
- Enable taking devices offline for maintenance while still allowing test pods
162162
to request and use those devices. Being able to do this one device at a time
163163
minimizes service level disruption.
164164

@@ -200,7 +200,7 @@ hardware instance to support hot-swapping. Admins might favor using the names
200200
whereas health monitoring might prefer to be specific and use a vendor-defined
201201
unique ID. Both are supported, which creates additional complexity.
202202

203-
Without a kubectl extension similar to `kubectl taint nodes` the user
203+
Without a kubectl extension similar to `kubectl taint nodes`, the user
204204
experience for admins will be a bit challenging. They need to decide how to
205205
identify the device (by name or with a CEL expression), manually create a
206206
ResourceSlicePatch with a unique name, then remember to remove that
@@ -382,7 +382,9 @@ code. Alternatively, that code can be copied.
382382
The DevicePatch also gets extended. It is possible to use
383383
admin-controlled taints without enabling attribute overrides by enabling the
384384
`v1alpha3` API and only the `DRADeviceTaints` feature, while leaving
385-
`DRAAdminControlledDeviceAttributes` disabled.
385+
`DRAAdminControlledDeviceAttributes` disabled, because then the
386+
ResourceSlicePatch type is available with only the fields needed for
387+
taints.
386388

387389
```Go
388390
type DevicePatch struct {
@@ -759,7 +761,7 @@ harder for users to get a complete view.
759761
## Alternatives
760762
761763
The existing taint-eviction-controller could be extended to cover device
762-
taints. Cloning it lowers the risk of breaking existing stable functionality.
764+
taints. However, cloning it lowers the risk of breaking existing stable functionality.
763765
764766
Tolerations for device taints could also be added to individual pods. This
765767
seems less useful because if pods share the same claim, they are typically part

0 commit comments

Comments
 (0)