Skip to content

Conversation

mrniranjan
Copy link
Contributor

@mrniranjan mrniranjan commented Apr 26, 2025

This PR contains Additional tests related to CPUManager policy option to align Cpus based on Uncore cache.
Contains tests related to Multiple Pods, Containers and with SMT disabled

@mrniranjan mrniranjan marked this pull request as draft April 26, 2025 02:12
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 26, 2025
@openshift-ci openshift-ci bot requested review from rbaturov and swatisehgal April 26, 2025 02:13
@mrniranjan mrniranjan marked this pull request as ready for review April 30, 2025 11:06
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 30, 2025
@openshift-ci openshift-ci bot requested review from ffromani and MarSik April 30, 2025 11:07
@mrniranjan
Copy link
Contributor Author

This PR depends on #1302

@mrniranjan mrniranjan force-pushed the llc_take3_1 branch 3 times, most recently from df99dbd to 1667034 Compare May 20, 2025 05:54
Adds  Additional tests related to CPUManager policy option
to align Cpus based on Uncore cache.

Contains tests related to Multiple Pods,
multiple containers  and with SMT disabled

Signed-off-by: Niranjan M.R <[email protected]>
Copy link
Contributor

@ffromani ffromani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initial review, nothing huge but quite a few questions inside

corev1.ResourceMemory: resource.MustParse("100Mi"),
Context("With Multiple Pods", func() {
DescribeTable("Align multiple Guaranteed pods",
func(l3uncoreCacheSharing string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's not use a string, use a enum-like type. Example:

type L3UncoreCacheShareMode string

const (
	L3UncoreCacheShareEqual  L3UncoreCacheShareMode = "equal"
	L3UncoreCacheShareUnequal  L3UncoreCacheShareMode = "unequal"
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in the latest commit

targetNode = workerRTNodes[0]
cpusetCfg = &controller.CpuSet{}
cpusetList []cpuset.CPUSet
podCpuList []string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems more like podCpuRequirementList or podCpuRequirements

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in the latest commit

dpList []*appsv1.Deployment
)

podLabel := make(map[string]string)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why if we initialize again in the for loop body below on line 519?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in the latest commit


Context("Multiple Containers", func() {
DescribeTable("Verify CPU Allignmen with multiple containers",
func(deploymentName string, alignment string, sideCarContainerName []string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto for alignment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in the latest commit

if l3uncoreCacheSharing == "equal" {
Expect(cpusetList[0]).To(Equal(cpusetList[1]))
} else {
Expect(cpusetList[0]).ToNot(Equal(cpusetList[1]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should check the retrieved cpulist also match the expected size (podCpuList)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in the latest commit

Expect(err).ToNot(HaveOccurred())
containerWithnonIntegralCpu2, err := cpuset.Parse(cpusetCfg.Cpus)
Expect(err).ToNot(HaveOccurred())
Expect(containerWithnonIntegralCpu2).To(Equal(expectedBurstablePodCpus))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cpusets should be compared for equality using their Equal method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in the latest commit

Expect(err).ToNot(HaveOccurred())
containerWithnonIntegralCpu1, err := cpuset.Parse(cpusetCfg.Cpus)
Expect(err).ToNot(HaveOccurred())
Expect(containerWithnonIntegralCpu1).To(Equal(expectedBurstablePodCpus))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment for line 665 below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in the latest commit

TopologyPolicy: &policy,
}

profile.Spec.AdditionalKernelArgs = []string{"nosmt", "module_blacklist=irdma"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where do we revert this setting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -662,3 +941,36 @@ func waitForDeploymentPodsDeletion(ctx context.Context, targetNode *corev1.Node,
Expect(err).ToNot(HaveOccurred())
}
}

func createSidecarContainers(names []string, cpuResources []string) []corev1.Container {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are not "sidecar" containers as kube intend them, so this function should probably be called "multicontainer" or so

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in the latest commit

Instead of using string, the arguments to table
driven test cases are converted to enum-like variables
using constants.

Other Minor fixes like comparing cpuset variable using
inbuilt Equal operator

Signed-off-by: Niranjan M.R <[email protected]>
podCpuRequirementList = []string{fmt.Sprintf("%d", L3CacheGroupSize), fmt.Sprintf("%d", L3CacheGroupSize/2)}
}

for i := range 2 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment please why 2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in the latest commit

Copy link
Contributor

@shajmakh shajmakh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this. I am not well familiar with the feature yet hence cannot provide feedback on the anticipated pods cpus results, but I didn't spot anything breaking too. I'll defer the LGTM to more expert members in the feature. However, I added few comments, which are non blocking and can be even addressed in a cleanup PR to allow running the tests in the CI and not keep them hanging. Overall I recommend adding more logs about the test steps to make the test log more readable and enable better troubleshooting on failures.

err = getter.Container(ctx, &testpod, testpod.Spec.Containers[0].Name, cpusetCfg)
Expect(err).ToNot(HaveOccurred())
podCpuset, err := cpuset.Parse(cpusetCfg.Cpus)
testlog.TaggedInfof("Pod", "Cpus used by pod %v are %v", testpod.Name, podCpuset.String())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refering to a namespace-scoped resources is usually done by / to easily identify the resource

podList := &corev1.PodList{}
podLabel["test-app"] = fmt.Sprintf("telcoApp-%d", i)
listOptions := &client.ListOptions{Namespace: testutils.NamespaceTesting, LabelSelector: labels.SelectorFromSet(podLabel)}
Eventually(func() bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a log (inside the eventually or outside) about waiting for the deployment to be ready can be helpful to track tests steps

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in the latest commit

}
// Here pods of both deployments can take the full core even as it will follow
// packed allocation
// From KEP: takeUncoreCache and takePartialUncore will still follow a "packed" allocation principle as the rest of the implementation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

cpuResources := []string{fmt.Sprintf("%d", cpuSize)}
containerList = createMultipleContainers(containerName, cpuResources)
case partialAlignment:
// WIth 2 guaranteed containers, where 1 is requesting half of L3CacheGroupSize
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: With

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in the latest commit

Expect(err).ToNot(HaveOccurred())
containerWithnonIntegralCpu1, err := cpuset.Parse(cpusetCfg.Cpus)
Expect(err).ToNot(HaveOccurred())
Expect(containerWithnonIntegralCpu1.Equals(expectedBurstablePodCpus)).To(BeTrue())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for these types of assertions I always prefer to print out the values of each variable if the assert failed to enhance the troubleshooting, here and in similar assertions in this PR. the reason for this is that when such Expect() fails one can only know that "Expected false to equal true" and no other details is provided about the actual compared values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in the latest commit

return containers
}

// transformToNoSMT moves takes a map which contains cores and its siblings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moves

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in the latest commit

Add messages when assertions fail

Signed-off-by: Niranjan M.R <[email protected]>
@mrniranjan
Copy link
Contributor Author

/test e2e-aws-ovn

Copy link
Contributor

openshift-ci bot commented Jun 4, 2025

@mrniranjan: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Contributor

@ffromani ffromani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
/lgtm

Let's have this. If needed, we can iterate and improve later

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 5, 2025
Copy link
Contributor

openshift-ci bot commented Jun 5, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ffromani, mrniranjan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 5, 2025
@ffromani
Copy link
Contributor

ffromani commented Jun 5, 2025

/retitle NO-JIRA: E2E: Additional Functional tests for align cpus by UncoreCache Feature

@openshift-ci openshift-ci bot changed the title E2E: Additional Functional tests for align cpus by UncoreCache Feature NO-JIRA: E2E: Additional Functional tests for align cpus by UncoreCache Feature Jun 5, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 5, 2025
@openshift-ci-robot
Copy link
Contributor

@mrniranjan: This pull request explicitly references no jira issue.

In response to this:

This PR contains Additional tests related to CPUManager policy option to align Cpus based on Uncore cache.
Contains tests related to Multiple Pods, Containers and with SMT disabled

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@ffromani
Copy link
Contributor

ffromani commented Jun 5, 2025

/jira refresh

@openshift-ci-robot
Copy link
Contributor

@ffromani: This pull request explicitly references no jira issue.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-bot openshift-merge-bot bot merged commit eae77fb into openshift:main Jun 5, 2025
18 checks passed
@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: cluster-node-tuning-operator
This PR has been included in build cluster-node-tuning-operator-container-v4.20.0-202506051441.p0.geae77fb.assembly.stream.el9.
All builds following this will include this PR.

@mrniranjan mrniranjan changed the title NO-JIRA: E2E: Additional Functional tests for align cpus by UncoreCache Feature OCPBUGS-57150: E2E: Additional Functional tests for align cpus by UncoreCache Feature Jun 6, 2025
@openshift-ci-robot
Copy link
Contributor

@mrniranjan: Jira Issue OCPBUGS-57150: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-57150 has been moved to the MODIFIED state.

In response to this:

This PR contains Additional tests related to CPUManager policy option to align Cpus based on Uncore cache.
Contains tests related to Multiple Pods, Containers and with SMT disabled

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@mrniranjan
Copy link
Contributor Author

/cherry-pick release-4.19

@openshift-cherrypick-robot

@mrniranjan: new pull request created: #1348

In response to this:

/cherry-pick release-4.19

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants