@@ -94,6 +94,8 @@ SIG Architecture for cross-cutting KEPs).
94
94
- [ Implementation History] ( #implementation-history )
95
95
- [ Drawbacks] ( #drawbacks )
96
96
- [ Alternatives] ( #alternatives )
97
+ - [ Admin-intent in ResourceSlice] ( #admin-intent-in-resourceslice )
98
+ - [ Storing result of patching in ResourceSlice] ( #storing-result-of-patching-in-resourceslice )
97
99
<!-- /toc -->
98
100
99
101
## Release Signoff Checklist
@@ -224,6 +226,16 @@ caching the patched devices and
224
226
(re-)applying patches only when they or the device definitions change, which
225
227
should be rare.
226
228
229
+ Patching directly in the informer event handlers may be fast enough. If it
230
+ turns out to slow down those handlers too much, then a workqueue with workers
231
+ may be needed to decouple updating the cache from the events which trigger
232
+ updating and to avoid slowing down the informers.
233
+
234
+ The scheduler's "slice changed" cluster events must be driven by that cache,
235
+ not the original informers, otherwise a ResourceSlice or ResourceSlicePatch
236
+ change could trigger a pod scheduling attempt before the slice cache is
237
+ up-to-date again.
238
+
227
239
## Design Details
228
240
229
241
### API
@@ -237,7 +249,7 @@ are feature-gated.
237
249
238
250
``` Go
239
251
type ResourceSlicePatch struct {
240
- metav1.TypeMeta
252
+ metav1.TypeMeta
241
253
// Standard object metadata
242
254
// +optional
243
255
metav1.ObjectMeta
@@ -278,7 +290,7 @@ type DevicePatch struct {
278
290
// be marked as empty by setting their null field. Such entries remove the
279
291
// corresponding attribute in a ResourceSlice, if there is one, instead of
280
292
// overriding it. Because entries get removed and are not allowed in
281
- // slices, CEL expressions do not need need to deal with null values.
293
+ // slices, CEL expressions do not need to deal with null values.
282
294
//
283
295
// The maximum number of attributes and capacities in the DevicePatch combined is 32.
284
296
// This is an alpha field and requires enabling the DRAAdminControlledDeviceAttributes
@@ -288,6 +300,11 @@ type DevicePatch struct {
288
300
// +featureGate:DRAAdminControlledDeviceAttributes
289
301
Attributes map [FullyQualifiedName]NullableDeviceAttribute
290
302
303
+ // ^^^
304
+ // The size limit is the same as for attributes and capacities in a ResourceSlice.
305
+ // We could make it larger here because we are less constrained by overall object
306
+ // size, but it seems unnecessary.
307
+
291
308
// Capacity defines the set of capacities to patch for matching devices.
292
309
// The name of each capacity must be unique in that set and
293
310
// include the domain prefix.
@@ -707,15 +724,15 @@ No.
707
724
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
708
725
709
726
Pod scheduling should be as fast as would be without this feature, because in
710
- both cases it starts with listing all devices. That information is local can
727
+ both cases it starts with listing all devices. That information is local and
711
728
comes either from an informer cache or a cache of patched devices.
712
729
713
730
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
714
731
715
732
Filtering and patching are local operations, with no impact on the cluster. To
716
733
prevent doing the same work repeatedly, it will be implemented so that it gets
717
734
done once and then only processes changes. This increases CPU and RAM
718
- consumption. But even all devices should get patched (which is unlikely), memory
735
+ consumption. But even if all devices should get patched (which is unlikely), memory
719
736
will be shared between objects in the informer cache and in the patch cache, so
720
737
it will not be doubled.
721
738
@@ -766,7 +783,30 @@ harder for users to get a complete view.
766
783
767
784
## Alternatives
768
785
786
+ ### Admin-intent in ResourceSlice
787
+
769
788
Instead of ResourceSlicePatch as a separate type, new fields in the
770
789
ResourceSlice status could be modified by an admin. That has the problem that
771
790
the ResourceSlice object might get deleted while doing cluster maintenance like
772
791
a driver update, in which case the admin intent would get lost.
792
+
793
+ ### Storing result of patching in ResourceSlice
794
+
795
+ A controller could read ResourceSlicePatches and apply them to
796
+ ResourceSlices. Then consumers like the scheduler and users would only need to
797
+ look at ResourceSlices. This has several drawbacks.
798
+
799
+ We would need to duplicate the attributes in the slice status. If we didn't and
800
+ directly modified the spec, this patch controller and the CSI driver as the
801
+ owner of the slice spec would fight against each other. Also, after removing a
802
+ patch the original state must be available somewhere, otherwise the controller
803
+ cannot restore it.
804
+
805
+ Duplicating the attributes might make a slice too large. The limits were chosen
806
+ so that we have some space left for a status, but not enough for a status that
807
+ is potentially as large as the spec.
808
+
809
+ Creating a single ResourceSlicePatch could force the controller to update a
810
+ potentially large number of ResourceSlices. When using rate limiting, updating
811
+ them all will take longer than client-side patching. When not using rate
812
+ limiting, this could overwhelm the apiserver.
0 commit comments