Skip to content

Commit 4c5b2d2

Browse files
authored
CNF-16267: Add kernel page size field (#1262)
* api: added KernelPageSize to performance profile API KernelPageSize defines the size of the kernel page. For x86/amd64, the only valid value is 4k. For aarch64, the valid values are 4k, 64k. The default value is 4k, if none specified. Signed-off-by: Ronny Baturov <[email protected]> * validation: added validation to KernelPageSize field This update introduces the following validation checks for the KernelPageSize field: * On x86/amd64 systems, only 4k is accepted. * On aarch64 systems, 4k is accepted unconditionally, while 64k is accepted only if the real-time kernel is disabled (current supported behavior). * Any other values are rejected. Note: the validation ensures that all nodes in the MCP have the same architecture and determines the CPU architecture before running the validation checks mentioned above. Signed-off-by: Ronny Baturov <[email protected]> * controller: add support for 64k-pages kernel on aarch64 This commit consist of the following changes: * Enhanced the performanceprofile controller to choose 64k-pages MC kernelType, when 64k pages specified. * Added unit tests to cover the following scenarios: RealTime kernel disabled + 64k => 64-pages kernelType RealTime kernel disabled + 4k => default kernelType RealTime kernel enabled + 4k => realtime kernelType Note that the case of RealTime kernel enabled + 64k is not tested, as it is expected to be rejected by validation. Signed-off-by: Ronny Baturov <[email protected]> * docs: documenting kernelPageSize Signed-off-by: Ronny Baturov <[email protected]> --------- Signed-off-by: Ronny Baturov <[email protected]>
1 parent a3a7d19 commit 4c5b2d2

File tree

8 files changed

+214
-18
lines changed

8 files changed

+214
-18
lines changed

docs/performanceprofile/performance_profile.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ This document documents the PerformanceProfile API introduced by the Performance
2222
* [PerformanceProfileSpec](#performanceprofilespec)
2323
* [PerformanceProfileStatus](#performanceprofilestatus)
2424
* [RealTimeKernel](#realtimekernel)
25+
* [KernelPageSize](#kernelpagesize)
2526
* [WorkloadHints](#workloadhints)
2627

2728
## CPU
@@ -163,6 +164,7 @@ PerformanceProfileSpec defines the desired state of PerformanceProfile.
163164
| machineConfigPoolSelector | MachineConfigPoolSelector defines the MachineConfigPool label to use in the MachineConfigPoolSelector of resources like KubeletConfigs created by the operator. Defaults to \"machineconfiguration.openshift.io/role=&lt;same role as in NodeSelector label key&gt;\" | map[string]string | false |
164165
| nodeSelector | NodeSelector defines the Node label to use in the NodeSelectors of resources like Tuned created by the operator. It most likely should, but does not have to match the node label in the NodeSelector of the MachineConfigPool which targets this performance profile. In the case when machineConfigLabels or machineConfigPoolSelector are not set, we are expecting a certain NodeSelector format &lt;domain&gt;/&lt;role&gt;: \"\" in order to be able to calculate the default values for the former mentioned fields. | map[string]string | true |
165166
| realTimeKernel | RealTimeKernel defines a set of real time kernel related parameters. RT kernel won't be installed when not set. | *[RealTimeKernel](#realtimekernel) | false |
167+
| kernelPageSize | KernelPageSize defines the kernel page size. 4k is the default, 64k is only supported on aarch64 | *[kernelPageSize](#kernelpagesize) | false |
166168
| additionalKernelArgs | Additional kernel arguments. | []string | false |
167169
| numa | NUMA defines options related to topology aware affinities | *[NUMA](#numa) | false |
168170
| net | Net defines a set of network related features | *[Net](#net) | false |
@@ -193,6 +195,13 @@ RealTimeKernel defines the set of parameters relevant for the real time kernel.
193195

194196
[Back to TOC](#table-of-contents)
195197

198+
## KernelPageSize
199+
200+
KernelPageSize defines the kernel page size that will be used by the kernel.
201+
4k is the default value, 64k is only supported on aarch64 with realTimeKernel disabled.
202+
203+
[Back to TOC](#table-of-contents)
204+
196205
## WorkloadHints
197206

198207
WorkloadHints defines the set of upper level flags for different type of workloads.

manifests/20-performance-profile.crd.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -580,6 +580,10 @@ spec:
580580
size:
581581
description: Size defines huge page size, maps to the 'hugepagesz' kernel boot parameter.
582582
type: string
583+
kernelPageSize:
584+
description: KernelPageSize defines the kernel page size. 4k is the default, 64k is only supported on aarch64
585+
type: string
586+
default: 4k
583587
machineConfigLabel:
584588
description: |-
585589
MachineConfigLabel defines the label to add to the MachineConfigs the operator creates. It has to be

pkg/apis/performanceprofile/v2/performanceprofile_types.go

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,10 @@ type PerformanceProfileSpec struct {
6565
NodeSelector map[string]string `json:"nodeSelector"`
6666
// RealTimeKernel defines a set of real time kernel related parameters. RT kernel won't be installed when not set.
6767
RealTimeKernel *RealTimeKernel `json:"realTimeKernel,omitempty"`
68+
// KernelPageSize defines the kernel page size. 4k is the default, 64k is only supported on aarch64
69+
// +default="4k"
70+
// +optional
71+
KernelPageSize *KernelPageSize `json:"kernelPageSize,omitempty"`
6872
// Additional kernel arguments.
6973
// +optional
7074
AdditionalKernelArgs []string `json:"additionalKernelArgs,omitempty"`
@@ -131,6 +135,12 @@ type HardwareTuning struct {
131135
ReservedCpuFreq *CPUfrequency `json:"reservedCpuFreq,omitempty"`
132136
}
133137

138+
// KernelPageSize defines the size of the kernel pages.
139+
// The allowed values for this depend on CPU architecture
140+
// For x86/amd64, the only valid value is 4k.
141+
// For aarch64, the valid values are 4k, 64k.
142+
type KernelPageSize string
143+
134144
// HugePageSize defines size of huge pages
135145
// The allowed values for this depend on CPU architecture
136146
// For x86/amd64, the valid values are 2M and 1G

pkg/apis/performanceprofile/v2/performanceprofile_validation.go

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,8 @@ import (
4141
)
4242

4343
const (
44+
kernelPageSize4k = "4k"
45+
kernelPageSize64k = "64k"
4446
hugepagesSize2M = "2M"
4547
hugepagesSize32M = "32M"
4648
hugepagesSize512M = "512M"
@@ -61,6 +63,15 @@ var aarch64ValidHugepagesSizes = []string{
6163
hugepagesSize512M, // With 64k kernel pages
6264
}
6365

66+
var aarch64ValidKernelPageSizes = []string{
67+
kernelPageSize4k,
68+
kernelPageSize64k,
69+
}
70+
71+
var x86ValidKernelPageSizes = []string{
72+
kernelPageSize4k,
73+
}
74+
6475
var validatorContext = context.TODO()
6576

6677
// ValidateCreate implements webhook.Validator so a webhook will be registered for the type
@@ -145,6 +156,7 @@ func (r *PerformanceProfile) ValidateBasicFields() field.ErrorList {
145156
allErrs = append(allErrs, r.validateAllNodesAreSameCpuArchitecture(nodes)...)
146157
allErrs = append(allErrs, r.validateAllNodesAreSameCpuCapacity(nodes)...)
147158
allErrs = append(allErrs, r.validateHugePages(nodes)...)
159+
allErrs = append(allErrs, r.validateKernelPageSize(nodes)...)
148160
allErrs = append(allErrs, r.validateNUMA()...)
149161
allErrs = append(allErrs, r.validateNet()...)
150162
allErrs = append(allErrs, r.validateWorkloadHints()...)
@@ -438,6 +450,67 @@ func (r *PerformanceProfile) validateHugePages(nodes corev1.NodeList) field.Erro
438450
return allErrs
439451
}
440452

453+
func (r *PerformanceProfile) validateKernelPageSize(nodes corev1.NodeList) field.ErrorList {
454+
var allErrs field.ErrorList
455+
456+
if r.Spec.KernelPageSize == nil {
457+
return allErrs
458+
}
459+
460+
// We can only partially validate this if we have no nodes
461+
// We can check that the value used is legitimate but we cannot check
462+
// whether it is supposed to be x86 or aarch64
463+
x86 := false
464+
aarch64 := false
465+
466+
if len(nodes.Items) > 0 {
467+
// `validateKernelPageSize` implicitly relies on `validateAllNodesAreSameCpuArchitecture` to have already been run
468+
// Under that assumption we can return any node from the list since they should all be the same architecture
469+
// However it is simple and easy to just return the first node
470+
x86 = isX86(nodes.Items[0])
471+
aarch64 = isAarch64(nodes.Items[0])
472+
}
473+
474+
kernelPageSize := *r.Spec.KernelPageSize
475+
errField := "spec.kernelPageSize"
476+
errMsg := "KernelPageSize should be equal to one of:"
477+
478+
if x86 && !slices.Contains(x86ValidKernelPageSizes, string(kernelPageSize)) {
479+
allErrs = append(
480+
allErrs,
481+
field.Invalid(
482+
field.NewPath(errField),
483+
r.Spec.KernelPageSize,
484+
fmt.Sprintf("%s %v", errMsg, kernelPageSize4k),
485+
),
486+
)
487+
} else if aarch64 && !slices.Contains(aarch64ValidKernelPageSizes, string(kernelPageSize)) {
488+
allErrs = append(
489+
allErrs,
490+
field.Invalid(
491+
field.NewPath(errField),
492+
r.Spec.KernelPageSize,
493+
fmt.Sprintf("%s %v", errMsg, kernelPageSize64k),
494+
),
495+
)
496+
}
497+
498+
// Ensure 64k pages are used only with nodes based on aarch64 and when real-time kernel is disabled.
499+
if aarch64 && kernelPageSize == kernelPageSize64k &&
500+
r.Spec.RealTimeKernel != nil && r.Spec.RealTimeKernel.Enabled != nil && *r.Spec.RealTimeKernel.Enabled {
501+
allErrs = append(
502+
allErrs,
503+
field.Invalid(
504+
field.NewPath(errField),
505+
r.Spec.KernelPageSize,
506+
"64k pages are not supported on ARM64 with a real-time kernel yet",
507+
),
508+
)
509+
}
510+
511+
return allErrs
512+
}
513+
441514
func isX86(node corev1.Node) bool {
442515
return getCpuArchitectureForNode(node) == amd64
443516
}

pkg/apis/performanceprofile/v2/performanceprofile_validation_test.go

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -718,6 +718,87 @@ var _ = Describe("PerformanceProfile", func() {
718718
})
719719
})
720720

721+
Describe("KernelPagesSize validation", func() {
722+
var nodes corev1.NodeList
723+
var err error
724+
725+
Context("validation on x86 arch", func() {
726+
BeforeEach(func() {
727+
nodeSpecs := []NodeSpecifications{}
728+
nodeSpecs = append(nodeSpecs, NodeSpecifications{architecture: amd64, cpuCapacity: 1000, name: "node"})
729+
validatorClient = GetFakeValidatorClient(nodeSpecs)
730+
731+
nodes, err = profile.getNodesList()
732+
Expect(err).To(BeNil())
733+
})
734+
It("should accept 4k pages kernel pages size", func() {
735+
KernelPageSize := KernelPageSize(kernelPageSize4k)
736+
profile.Spec.KernelPageSize = &KernelPageSize
737+
errors := profile.validateKernelPageSize(nodes)
738+
Expect(errors).To(BeEmpty(), "expected no validation errors for 4k kernel page size")
739+
})
740+
It("should reject invalid input values for pages kernel page size", func() {
741+
invalidInputs := [3]string{"", "aaa", "64k"}
742+
for _, input := range invalidInputs {
743+
invalidKernelPageSize := KernelPageSize(input)
744+
profile.Spec.KernelPageSize = &invalidKernelPageSize
745+
errors := profile.validateKernelPageSize(nodes)
746+
Expect(errors).NotTo(BeEmpty(), "should have validation error when kernel page size is invalid")
747+
Expect(errors[0].Error()).To(ContainSubstring("KernelPageSize should be equal to one of"))
748+
}
749+
})
750+
})
751+
Context("validation on aarch64", func() {
752+
BeforeEach(func() {
753+
nodeSpecs := []NodeSpecifications{}
754+
nodeSpecs = append(nodeSpecs, NodeSpecifications{architecture: aarch64, cpuCapacity: 1000, name: "node"})
755+
validatorClient = GetFakeValidatorClient(nodeSpecs)
756+
nodes, err = profile.getNodesList()
757+
Expect(err).To(BeNil())
758+
759+
})
760+
It("should accept 4k kernel page size", func() {
761+
KernelPageSize := KernelPageSize(kernelPageSize4k)
762+
profile.Spec.KernelPageSize = &KernelPageSize
763+
errors := profile.validateKernelPageSize(nodes)
764+
Expect(errors).To(BeEmpty(), "expected no validation errors for 4k kernel page size")
765+
})
766+
It("should reject invalid input values for pages kernel page size", func() {
767+
invalidInputs := [4]string{"", "aaa", "4", "64"}
768+
for _, input := range invalidInputs {
769+
invalidKernelPageSize := KernelPageSize(input)
770+
profile.Spec.KernelPageSize = &invalidKernelPageSize
771+
errors := profile.validateKernelPageSize(nodes)
772+
Expect(errors).NotTo(BeEmpty(), "should have validation error when kernel page size is invalid")
773+
Expect(errors[0].Error()).To(ContainSubstring("KernelPageSize should be equal to one of"))
774+
}
775+
})
776+
When("real time kernel disabled", func() {
777+
It("should accept 64k page size", func() {
778+
profile.Spec.RealTimeKernel = &RealTimeKernel{
779+
Enabled: ptr.To(false),
780+
}
781+
KernelPageSize := KernelPageSize(kernelPageSize64k)
782+
profile.Spec.KernelPageSize = &KernelPageSize
783+
errors := profile.validateKernelPageSize(nodes)
784+
Expect(errors).To(BeEmpty(), "expected no validation errors for 64k kernel page size")
785+
})
786+
})
787+
When("real time kernel enabled", func() {
788+
It("should reject 64k page size", func() {
789+
profile.Spec.RealTimeKernel = &RealTimeKernel{
790+
Enabled: ptr.To(true),
791+
}
792+
KernelPageSize := KernelPageSize(kernelPageSize64k)
793+
profile.Spec.KernelPageSize = &KernelPageSize
794+
errors := profile.validateKernelPageSize(nodes)
795+
Expect(errors).ToNot(BeEmpty())
796+
Expect(errors[0].Error()).To(ContainSubstring("64k pages are not supported on ARM64 with a real-time kernel yet"))
797+
})
798+
})
799+
})
800+
})
801+
721802
Describe("Net validation", func() {
722803
Context("with properly populated fields", func() {
723804
It("should have net fields properly populated", func() {

pkg/apis/performanceprofile/v2/zz_generated.deepcopy.go

Lines changed: 5 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

pkg/performanceprofile/controller/performanceprofile/components/machineconfig/machineconfig.go

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,8 @@ const (
3737
MCKernelRT = "realtime"
3838
// MCKernelDefault is the value of the kernel setting in MachineConfig for the default kernel
3939
MCKernelDefault = "default"
40+
// MCKernel64kPages is the value of the kernel setting in MachineConfig for 64k-pages kernel on aarch64
41+
MCKernel64kPages = "64k-pages"
4042
// HighPerformanceRuntime contains the name of the high-performance runtime
4143
HighPerformanceRuntime = "high-performance"
4244

@@ -146,8 +148,13 @@ func New(profile *performancev2.PerformanceProfile, opts *components.MachineConf
146148
profile.Spec.RealTimeKernel.Enabled != nil &&
147149
*profile.Spec.RealTimeKernel.Enabled
148150

151+
// Real time kernel with 64k-pages for aarch64 not yet supported and rejected in the validation webhook.
149152
if enableRTKernel {
150153
mc.Spec.KernelType = MCKernelRT
154+
} else if profile.Spec.KernelPageSize != nil && *profile.Spec.KernelPageSize == performancev2.KernelPageSize("64k") {
155+
// During validation, we ensure that nodes are based on aarch64 when the administrator specifies 64k for this field.
156+
// Hence, this assignment is guaranteed to be safe.
157+
mc.Spec.KernelType = MCKernel64kPages
151158
} else {
152159
mc.Spec.KernelType = MCKernelDefault
153160
}

pkg/performanceprofile/controller/performanceprofile_controller_test.go

Lines changed: 25 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -395,24 +395,31 @@ var _ = Describe("Controller", func() {
395395
}
396396
})
397397

398-
It("should update MC when RT kernel gets disabled", func() {
399-
profile.Spec.RealTimeKernel.Enabled = ptr.To(false)
400-
r := newFakeReconciler(profile, mc, kc, tunedPerformance, profileMCP, infra, clusterOperator)
401-
402-
Expect(reconcileTimes(r, request, 1)).To(Equal(reconcile.Result{}))
403-
404-
key := types.NamespacedName{
405-
Name: machineconfig.GetMachineConfigName(profile.Name),
406-
Namespace: metav1.NamespaceNone,
407-
}
408-
409-
// verify MachineConfig update
410-
mc := &mcov1.MachineConfig{}
411-
err := r.Get(context.TODO(), key, mc)
412-
Expect(err).ToNot(HaveOccurred())
413-
414-
Expect(mc.Spec.KernelType).To(Equal(machineconfig.MCKernelDefault))
415-
})
398+
DescribeTable("MachineConfig kernelType updates",
399+
func(rtKernelEnabled bool, kernelPageSize *performancev2.KernelPageSize, expectedKernelType string) {
400+
// Set up profile
401+
profile.Spec.RealTimeKernel.Enabled = ptr.To(rtKernelEnabled)
402+
profile.Spec.KernelPageSize = kernelPageSize
403+
404+
r := newFakeReconciler(profile, mc, kc, tunedPerformance, profileMCP, infra, clusterOperator)
405+
406+
Expect(reconcileTimes(r, request, 1)).To(Equal(reconcile.Result{}))
407+
408+
// Verify MachineConfig update
409+
key := types.NamespacedName{
410+
Name: machineconfig.GetMachineConfigName(profile.Name),
411+
Namespace: metav1.NamespaceNone,
412+
}
413+
414+
mc := &mcov1.MachineConfig{}
415+
err := r.Get(context.TODO(), key, mc)
416+
Expect(err).ToNot(HaveOccurred())
417+
Expect(mc.Spec.KernelType).To(Equal(expectedKernelType))
418+
},
419+
Entry("should set kernelType to default when RealTimeKernel is disabled", false, nil, machineconfig.MCKernelDefault),
420+
Entry("should set kernelType to 64k-pages when RT kernel disabled and 64k kernel page size selected", false, ptr.To(performancev2.KernelPageSize("64k")), machineconfig.MCKernel64kPages),
421+
Entry("should set kernelType to realtime when RT kernel is enabled", true, nil, machineconfig.MCKernelRT),
422+
)
416423

417424
It("should update MC, KC and Tuned when CPU params change", func() {
418425
reserved := performancev2.CPUSet("0-1")

0 commit comments

Comments
 (0)