-
Notifications
You must be signed in to change notification settings - Fork 113
OCPBUGS-57150: E2E: Additional Functional tests for align cpus by UncoreCache Feature #1324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This PR depends on #1302 |
df99dbd
to
1667034
Compare
Adds Additional tests related to CPUManager policy option to align Cpus based on Uncore cache. Contains tests related to Multiple Pods, multiple containers and with SMT disabled Signed-off-by: Niranjan M.R <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
initial review, nothing huge but quite a few questions inside
corev1.ResourceMemory: resource.MustParse("100Mi"), | ||
Context("With Multiple Pods", func() { | ||
DescribeTable("Align multiple Guaranteed pods", | ||
func(l3uncoreCacheSharing string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's not use a string, use a enum-like type. Example:
type L3UncoreCacheShareMode string
const (
L3UncoreCacheShareEqual L3UncoreCacheShareMode = "equal"
L3UncoreCacheShareUnequal L3UncoreCacheShareMode = "unequal"
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in the latest commit
targetNode = workerRTNodes[0] | ||
cpusetCfg = &controller.CpuSet{} | ||
cpusetList []cpuset.CPUSet | ||
podCpuList []string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems more like podCpuRequirementList
or podCpuRequirements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in the latest commit
dpList []*appsv1.Deployment | ||
) | ||
|
||
podLabel := make(map[string]string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why if we initialize again in the for loop body below on line 519?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in the latest commit
|
||
Context("Multiple Containers", func() { | ||
DescribeTable("Verify CPU Allignmen with multiple containers", | ||
func(deploymentName string, alignment string, sideCarContainerName []string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto for alignment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in the latest commit
if l3uncoreCacheSharing == "equal" { | ||
Expect(cpusetList[0]).To(Equal(cpusetList[1])) | ||
} else { | ||
Expect(cpusetList[0]).ToNot(Equal(cpusetList[1])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should check the retrieved cpulist also match the expected size (podCpuList)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in the latest commit
Expect(err).ToNot(HaveOccurred()) | ||
containerWithnonIntegralCpu2, err := cpuset.Parse(cpusetCfg.Cpus) | ||
Expect(err).ToNot(HaveOccurred()) | ||
Expect(containerWithnonIntegralCpu2).To(Equal(expectedBurstablePodCpus)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cpusets should be compared for equality using their Equal
method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in the latest commit
Expect(err).ToNot(HaveOccurred()) | ||
containerWithnonIntegralCpu1, err := cpuset.Parse(cpusetCfg.Cpus) | ||
Expect(err).ToNot(HaveOccurred()) | ||
Expect(containerWithnonIntegralCpu1).To(Equal(expectedBurstablePodCpus)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comment for line 665 below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in the latest commit
TopologyPolicy: &policy, | ||
} | ||
|
||
profile.Spec.AdditionalKernelArgs = []string{"nosmt", "module_blacklist=irdma"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where do we revert this setting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a AfteraAll section at the start.
https://github.com/openshift/cluster-node-tuning-operator/blob/main/test/e2e/performanceprofile/functests/13_llc/llc.go#L130
@@ -662,3 +941,36 @@ func waitForDeploymentPodsDeletion(ctx context.Context, targetNode *corev1.Node, | |||
Expect(err).ToNot(HaveOccurred()) | |||
} | |||
} | |||
|
|||
func createSidecarContainers(names []string, cpuResources []string) []corev1.Container { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are not "sidecar" containers as kube intend them, so this function should probably be called "multicontainer" or so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in the latest commit
Instead of using string, the arguments to table driven test cases are converted to enum-like variables using constants. Other Minor fixes like comparing cpuset variable using inbuilt Equal operator Signed-off-by: Niranjan M.R <[email protected]>
podCpuRequirementList = []string{fmt.Sprintf("%d", L3CacheGroupSize), fmt.Sprintf("%d", L3CacheGroupSize/2)} | ||
} | ||
|
||
for i := range 2 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment please why 2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed in the latest commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this. I am not well familiar with the feature yet hence cannot provide feedback on the anticipated pods cpus results, but I didn't spot anything breaking too. I'll defer the LGTM to more expert members in the feature. However, I added few comments, which are non blocking and can be even addressed in a cleanup PR to allow running the tests in the CI and not keep them hanging. Overall I recommend adding more logs about the test steps to make the test log more readable and enable better troubleshooting on failures.
err = getter.Container(ctx, &testpod, testpod.Spec.Containers[0].Name, cpusetCfg) | ||
Expect(err).ToNot(HaveOccurred()) | ||
podCpuset, err := cpuset.Parse(cpusetCfg.Cpus) | ||
testlog.TaggedInfof("Pod", "Cpus used by pod %v are %v", testpod.Name, podCpuset.String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refering to a namespace-scoped resources is usually done by / to easily identify the resource
podList := &corev1.PodList{} | ||
podLabel["test-app"] = fmt.Sprintf("telcoApp-%d", i) | ||
listOptions := &client.ListOptions{Namespace: testutils.NamespaceTesting, LabelSelector: labels.SelectorFromSet(podLabel)} | ||
Eventually(func() bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a log (inside the eventually or outside) about waiting for the deployment to be ready can be helpful to track tests steps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed in the latest commit
} | ||
// Here pods of both deployments can take the full core even as it will follow | ||
// packed allocation | ||
// From KEP: takeUncoreCache and takePartialUncore will still follow a "packed" allocation principle as the rest of the implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
cpuResources := []string{fmt.Sprintf("%d", cpuSize)} | ||
containerList = createMultipleContainers(containerName, cpuResources) | ||
case partialAlignment: | ||
// WIth 2 guaranteed containers, where 1 is requesting half of L3CacheGroupSize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: With
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed in the latest commit
Expect(err).ToNot(HaveOccurred()) | ||
containerWithnonIntegralCpu1, err := cpuset.Parse(cpusetCfg.Cpus) | ||
Expect(err).ToNot(HaveOccurred()) | ||
Expect(containerWithnonIntegralCpu1.Equals(expectedBurstablePodCpus)).To(BeTrue()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for these types of assertions I always prefer to print out the values of each variable if the assert failed to enhance the troubleshooting, here and in similar assertions in this PR. the reason for this is that when such Expect() fails one can only know that "Expected false to equal true" and no other details is provided about the actual compared values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed in the latest commit
return containers | ||
} | ||
|
||
// transformToNoSMT moves takes a map which contains cores and its siblings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moves
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed in the latest commit
Add messages when assertions fail Signed-off-by: Niranjan M.R <[email protected]>
/test e2e-aws-ovn |
@mrniranjan: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
/lgtm
Let's have this. If needed, we can iterate and improve later
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ffromani, mrniranjan The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retitle NO-JIRA: E2E: Additional Functional tests for align cpus by UncoreCache Feature |
@mrniranjan: This pull request explicitly references no jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/jira refresh |
@ffromani: This pull request explicitly references no jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
[ART PR BUILD NOTIFIER] Distgit: cluster-node-tuning-operator |
@mrniranjan: Jira Issue OCPBUGS-57150: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-57150 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/cherry-pick release-4.19 |
@mrniranjan: new pull request created: #1348 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
This PR contains Additional tests related to CPUManager policy option to align Cpus based on Uncore cache.
Contains tests related to Multiple Pods, Containers and with SMT disabled