Skip to content

perfprof: add enablement annotation #1278

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ffromani
Copy link
Contributor

@ffromani ffromani commented Jan 9, 2025

In order to use the cpumanager policy option prefer-align-cpus-by-uncorecache users need to create an enablement file through a MachineConfig.
This is because the feature is opt-in and supported only on selected cases. By providing MachineConfigs, users explicitely opt in and this is pretty evident on support scenarios.

Problem is: the required MachineConfig has error prone components.
We would like to streamline the flow but keep the explicit opt-in step.

A possible improvement is to automate the MachineConfig generation when the cpumanager option is injected through the kubletconfig.experimental annotation, which is currently the only supported way to enable this feature.

To keep the opt-in component, users would also need to supply another annotation:

performance.openshift.io/autogenerate-enablement: "true"

So the performance profile could look like

annotations:
  "kubeletconfig.experimental": "{\"cpuManagerPolicyOptions\":{\"prefer-align-cpus-by-uncorecache\":\"true\"}}"
  "performance.openshift.io/autogenerate-enablement": "true"

@openshift-ci openshift-ci bot requested review from rbaturov and yanirq January 9, 2025 14:04
Copy link
Contributor

openshift-ci bot commented Jan 9, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ffromani

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 9, 2025
@ffromani ffromani force-pushed the llc-enablement-file-from-experimental branch from c1fcd0f to 5319763 Compare January 9, 2025 15:29
@ffromani ffromani force-pushed the llc-enablement-file-from-experimental branch 2 times, most recently from 79d1742 to 50243cb Compare January 28, 2025 13:56
@ffromani ffromani changed the title WIP perfprof: add enablement annotation Jan 28, 2025
@ffromani ffromani force-pushed the llc-enablement-file-from-experimental branch from 50243cb to c29a442 Compare January 29, 2025 12:59
Expect(ok).To(BeFalse(), "expected path %q found in ignition", expectedPath)
})

It("should not be generated if missing control annotation", func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should also check that the prefer-align-cpus-by-uncorecache is false in the final KubeletConfig. Otherwise the kubelet dies.

Copy link
Contributor Author

@ffromani ffromani Feb 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we did a different test for this though, and I'm not sure this in scope with this PR.

klog.V(4).InfoS("components manifests", "autogenerateEnablement", autogen, "LLCEnabled", llcEnabled)

mcOpts := opts.MachineConfig.Clone()
mcOpts.LLCFileEnabled = autogen && llcEnabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we need is also - when autogen == false force llcEnabled to false.
Then not generating the file is cleaner, but makes no functional difference to kubelet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you enable the feature in Kubeletconfig without the trigger file, kubelet fails to start with the unknown config error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, this means we need a different feature then. Is no longer a simplification but rather a master switch. At this point we should evaluate if we should add to our APIs and take full ownership of the flag.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is both. I view it as simplification that at the same time prevents human error that is hard to recover from. We can either develop a master switch or if you like just not apply the setting when both pieces are not in place and report an error.

But allowing the unsafe combination is going to backfire. In fact, it has already.

In order to use the cpumanager policy option `prefer-align-cpus-by-uncorecache`
users need to create an enablement file through a MachineConfig.
This is because the feature is opt-in and supported only on selected
cases. By providing MachineConfigs, users explicitely opt in and this is
pretty evident on support scenarios.

Problem is: the required MachineConfig has error prone components.
We would like to streamline the flow but keep the explicit opt-in step.

A possible improvement is to automate the MachineConfig generation when
the cpumanager option is injected through the
`kubletconfig.experimental` annotation, which is currently the only
supported way to enable this feature.

To keep the opt-in component, users would also need to supply another
annotation:
```
performance.openshift.io/autogenerate-enablement: "true"
```

So the performance profile could look like
```
annotations:
  "kubeletconfig.experimental": "{\"cpuManagerPolicyOptions\":{\"prefer-align-cpus-by-uncorecache\":\"true\"}}"
  "performance.openshift.io/autogenerate-enablement": "true"
```

Signed-off-by: Francesco Romani <[email protected]>
@ffromani ffromani force-pushed the llc-enablement-file-from-experimental branch from c29a442 to 998e392 Compare February 3, 2025 13:53
@ffromani
Copy link
Contributor Author

ffromani commented Feb 4, 2025

/retest-required

1 similar comment
@ffromani
Copy link
Contributor Author

ffromani commented Feb 5, 2025

/retest-required

Copy link
Contributor

openshift-ci bot commented Mar 17, 2025

@ffromani: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-operator 998e392 link true /test e2e-aws-operator
ci/prow/e2e-hypershift-pao 998e392 link true /test e2e-hypershift-pao
ci/prow/e2e-gcp-pao 998e392 link true /test e2e-gcp-pao
ci/prow/e2e-gcp-pao-updating-profile 998e392 link true /test e2e-gcp-pao-updating-profile
ci/prow/e2e-gcp-pao-workloadhints 998e392 link true /test e2e-gcp-pao-workloadhints

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 16, 2025
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants