Skip to content

balloons: add cpuClasses with cpufreq and PCT turbo allocation#667

Open
askervin wants to merge 10 commits into
containers:mainfrom
askervin:5h1-balloons-cpuclass
Open

balloons: add cpuClasses with cpufreq and PCT turbo allocation#667
askervin wants to merge 10 commits into
containers:mainfrom
askervin:5h1-balloons-cpuclass

Conversation

@askervin

@askervin askervin commented May 13, 2026

Copy link
Copy Markdown
Collaborator

This PR introduces pkg/resmgr/cpuclass as a replacement for pkg/resmgr/control/cpu. The latter is not removed in this patch series, but it is disconnected from the balloons policy and thus left unused.

Unlike the CPU controller, that is decoupled from resource policies, cpuclass is designed for policy-controlled CPU tuning. This is needed with more complex CPU tuning scenarios, for instance:

  • configure frequencies of CPUs that are unassigned or shared by multiple containers
  • provide hints to policy's CPU allocation about CPUs where wanted tunings are possible.

This PR introduces new configuration section cpuClasses for specifying CPU tunings. This is aligned with schedulingClasses and loadClasses already available on the top level in balloons policy configuration. It uses the same class notation as other classes and balloonTypes, that is, a list of objects with "name" attribute specifying a class name, replacing control.cpu.classes object where key was the class name.

This PR enables using cpuclass from the balloons policy. However, it is expected to be usable from other policies, too. Of course the topology-aware policy in particular. Therefore cpuclass API associates only cpusets with CPU class names, and it knows nothing about "balloons", "shared CPUs" or "idle CPUs", that would have no direct equivalents in the topology-aware policy.

CPU frequency configuration is extended with support for units (3100MHz or 3.1GHz), and symbolic frequencies: "min" (minimum frequency), "base" (base frequency) and "turbo" (max turbo frequency).

The cpufreq allocator (internal to cpuclass) works by changing symbolic frequency interpretation. CPUs belonging to highest turboPriority CPU classes are eligible to the "turbo" frequency. But if "turbo" is specified as minFreq or maxFreq for CPUs that belong to CPU classes with a lower turboPriority, their "turbo" is interpreted as "base". However, when all containers that belonged to the highest turboPriority classes are removed, CPU classes with the next highest turboPriority become highest ones, and they get their "turbo" interpreted as "turbo" again. In other words, at any point of time, highest priority CPU classes get the turbo while other get at most "base".

The pct allocator (internal to cpuclass) works in two modes. The assoc-only mode associates CPUs to user-specified CLOSes. In this mode, user specifies pctClosID in cpuClasses to associate CPUs in that class in correct predefined CLOS. The managed mode replaces system SST configuration with its own, removing need for using intel-speed-select or BIOS PCT configuration. In this mode, user specifies pctPriority: high in cpuClasses to tune CPUs into a CLOS with maximum performance.

@askervin askervin force-pushed the 5h1-balloons-cpuclass branch 3 times, most recently from 2695589 to e919069 Compare May 15, 2026 11:06
@askervin askervin marked this pull request as ready for review May 15, 2026 11:31
@askervin askervin changed the title WIP: balloons: add cpuClasses balloons: add cpuClasses May 15, 2026
@askervin askervin requested a review from Copilot May 15, 2026 11:32

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a top-level cpuClasses configuration model for the balloons policy, including human-readable/symbolic frequency parsing, turbo-priority allocation, CRD/docs updates, and e2e coverage for turbo behavior and legacy CPU class syntax.

Changes:

  • Introduces CPUClass/Frequency API types, CRD schema updates, docs, and config template migration to cpuClasses.
  • Adds a balloons CPU class turbo allocator that resolves symbolic frequencies and coordinates with the CPU controller.
  • Updates CPU controller/sysfs/test support for dynamic classes, cpufreq overrides, write deduplication, and turbo-priority e2e validation.

Reviewed changes

Copilot reviewed 19 out of 21 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
cmd/plugins/balloons/policy/balloons-policy.go Wires turbo allocator into balloons setup, reset, assignment, validation, and reconfiguration.
cmd/plugins/balloons/policy/cpuclass.go Adds turbo-aware CPU class allocator and symbolic frequency resolution.
cmd/plugins/balloons/policy/flags.go Adds aliases for new CPU class/frequency config types.
config/crd/bases/config.nri_balloonspolicies.yaml Adds cpuClasses to generated CRD schema.
deployment/helm/balloons/crds/config.nri_balloonspolicies.yaml Adds Helm-packaged CRD schema for cpuClasses.
docs/resource-policy/policy/balloons.md Documents preferred cpuClasses, symbolic units, turbo priority, and legacy syntax.
pkg/apis/config/v1alpha1/balloons-policy.go Injects top-level cpuClasses into common CPU controller config.
pkg/apis/config/v1alpha1/resmgr/policy/balloons/config.go Adds CPUClasses to balloons policy config.
pkg/apis/config/v1alpha1/resmgr/policy/balloons/zz_generated.deepcopy.go Adds deepcopy support for balloons CPUClasses.
pkg/apis/config/v1alpha1/resmgr/policy/cpuclass.go Defines user-facing CPU class fields.
pkg/apis/config/v1alpha1/resmgr/policy/frequency.go Adds frequency parsing, JSON marshal/unmarshal, symbolic values, and resolution helpers.
pkg/apis/config/v1alpha1/resmgr/policy/zz_generated.deepcopy.go Adds deepcopy support for CPUClass.
pkg/resmgr/control/cpu/api.go Adds dynamic SetClass and defers enforcement logging before controller start.
pkg/resmgr/control/cpu/cache.go Downgrades missing assignment cache log on fresh startup.
pkg/resmgr/control/cpu/cpu.go Adds per-CPU cpufreq write cache and merges dynamic/static CPU class definitions.
pkg/sysfs/system.go Adds test-oriented cpufreq sysfs override support.
test/e2e/policies.test-suite/balloons/balloons-config.yaml.in Migrates default balloons test config to top-level cpuClasses.
test/e2e/policies.test-suite/balloons/n4c16/test17-cstates-scheduling/balloons-cstates.cfg Converts C-state class config to cpuClasses.
test/e2e/policies.test-suite/balloons/n4c16/test18-turbo-priority/balloons-turbo.cfg Adds turbo-priority e2e config.
test/e2e/policies.test-suite/balloons/n4c16/test18-turbo-priority/balloons-turbo-oldsyntax.cfg Adds legacy control.cpu.classes compatibility e2e config.
test/e2e/policies.test-suite/balloons/n4c16/test18-turbo-priority/code.var.sh Adds turbo-priority and cpufreq write-minimality e2e flow.
Files not reviewed (2)
  • pkg/apis/config/v1alpha1/resmgr/policy/balloons/zz_generated.deepcopy.go: Language not supported
  • pkg/apis/config/v1alpha1/resmgr/policy/zz_generated.deepcopy.go: Language not supported
Comments suppressed due to low confidence (3)

cmd/plugins/balloons/policy/balloons-policy.go:1445

  • When a live update changes only idleCPUClass, this branch detects a CPU-class-only change but never copies newBalloonsOptions.IdleCpuClass into p.bpoptions. The allocator is reconfigured with the old idle class and resetCpuClass() continues to apply the old value, so idle class changes are ignored until a full policy reconfiguration occurs.
			// Update CPUClasses definitions.
			p.bpoptions.CPUClasses = newBalloonsOptions.CPUClasses
			if p.turboAllocator != nil {
				if err := p.turboAllocator.Reconfigure(p.bpoptions.CPUClasses, p.bpoptions.IdleCpuClass); err != nil {

cmd/plugins/balloons/policy/balloons-policy.go:1713

  • The turbo allocator is created/reconfigured before fillBuiltinBalloonDefs() and validateConfig() run. If validation fails, this has already mutated policy/controller state via p.turboAllocator and cpucontrol.SetClass, so an invalid configuration update can leave partially applied CPU class definitions behind despite setConfig() returning an error.
	if p.turboAllocator == nil {
		ta, err := NewCPUClassTurboAllocator(
			WithSystem(p.options.System),
			WithCache(p.cch),
			WithCPUClasses(bpoptions.CPUClasses),
			WithIdleClass(bpoptions.IdleCpuClass),
		)
		if err != nil {
			return balloonsError("failed to create CPU class turbo allocator: %w", err)
		}
		p.turboAllocator = ta
	} else {
		if err := p.turboAllocator.Reconfigure(bpoptions.CPUClasses, bpoptions.IdleCpuClass); err != nil {
			return balloonsError("failed to reconfigure CPU class turbo allocator: %w", err)
		}
	}

cmd/plugins/balloons/policy/cpuclass.go:199

  • Idle CPUs are assigned once but are not tracked or reassigned when the turbo winner changes. If idleCPUClass uses symbolic turbo, idle CPUs keep the effective value from the last reset/release (for example turbo from startup) even after a higher-priority active class should cap non-winners to base.
// ResetIdle assigns the given CPU set to the idle class via the CPU
// controller. Used at policy startup to bring all allowed CPUs to a
// known baseline before any container-driven UseClass call. Does not
// affect the active-class tracking.
func (a *CPUClassTurboAllocator) ResetIdle(cpus cpuset.CPUSet) error {
	if cpus.IsEmpty() {
		return nil
	}
	if err := cpucontrol.Assign(a.cch, a.idleClassName, cpus.UnsortedList()...); err != nil {
		return fmt.Errorf("failed to assign CPUs %s to idle class %q: %w", cpus, a.idleClassName, err)
	}
	return nil

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/apis/config/v1alpha1/resmgr/policy/frequency.go
Comment thread pkg/resmgr/control/cpu/cpu.go Outdated
Comment thread cmd/plugins/balloons/policy/balloons-policy.go
Comment thread cmd/plugins/balloons/policy/cpuclass.go Outdated
Comment thread cmd/plugins/balloons/policy/cpuclass.go Outdated
@askervin askervin requested review from kad and marquiz May 17, 2026 06:48
@askervin

Copy link
Copy Markdown
Collaborator Author

@kad, @marquiz, do you think we could approach turbo budget sharing with this kind of architecture in the balloons policy?

I'm adding cpuClasses under resmgr similarly to schedulingClasses to pave the way taking them into the topology-aware policy's guaranteed containers later on, too.

@askervin askervin marked this pull request as draft May 18, 2026 09:55
@askervin

Copy link
Copy Markdown
Collaborator Author

There is some technical and architectural debt that I wish to pay still in this PR. That is, the CPU controller should not directly modify frequencies, but this should be via cache and aligned with the spirit of applying "pending updates".

Unfortunately controller hooks are container-specific and possibly called multiple times while handling single NRI event, whereas all CPU properties should be written once per NRI event. I'll add yet another hook to the Controller interface to commit whatever changes a controller has stored since the previous Commit().

@askervin askervin force-pushed the 5h1-balloons-cpuclass branch from e919069 to d61da00 Compare May 18, 2026 10:52
@kad kad requested a review from Copilot May 19, 2026 07:38

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 24 changed files in this pull request and generated 5 comments.

Files not reviewed (2)
  • pkg/apis/config/v1alpha1/resmgr/policy/balloons/zz_generated.deepcopy.go: Language not supported
  • pkg/apis/config/v1alpha1/resmgr/policy/zz_generated.deepcopy.go: Language not supported

Comment thread pkg/resmgr/control/cpu/cpu.go
Comment thread cmd/plugins/balloons/policy/cpuclass.go Outdated
Comment thread cmd/plugins/balloons/policy/balloons-policy.go Outdated
Comment thread cmd/plugins/balloons/policy/balloons-policy.go
Comment thread docs/resource-policy/policy/balloons.md Outdated
@askervin askervin force-pushed the 5h1-balloons-cpuclass branch 3 times, most recently from 22769db to 35ca662 Compare May 22, 2026 12:07
@askervin askervin requested a review from Copilot May 26, 2026 09:01

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 28 changed files in this pull request and generated 8 comments.

Files not reviewed (2)
  • pkg/apis/config/v1alpha1/resmgr/policy/balloons/zz_generated.deepcopy.go: Language not supported
  • pkg/apis/config/v1alpha1/resmgr/policy/zz_generated.deepcopy.go: Language not supported

Comment thread pkg/resmgr/control/cpu/cpu.go Outdated
Comment thread pkg/resmgr/control/cpu/cpu.go Outdated
Comment thread pkg/resmgr/control/cpu/cpu.go Outdated
Comment thread pkg/resmgr/control/cpu/cpu.go Outdated
Comment thread cmd/plugins/balloons/policy/cpuclass.go Outdated
Comment thread cmd/plugins/balloons/policy/balloons-policy.go Outdated
Comment thread docs/resource-policy/policy/balloons.md Outdated
Comment thread cmd/plugins/balloons/policy/balloons-policy.go
@askervin askervin force-pushed the 5h1-balloons-cpuclass branch 3 times, most recently from e82098a to ce383ef Compare May 26, 2026 14:21
@askervin askervin requested a review from Copilot May 26, 2026 14:21

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 28 changed files in this pull request and generated 4 comments.

Files not reviewed (2)
  • pkg/apis/config/v1alpha1/resmgr/policy/balloons/zz_generated.deepcopy.go: Language not supported
  • pkg/apis/config/v1alpha1/resmgr/policy/zz_generated.deepcopy.go: Language not supported

Comment thread cmd/plugins/balloons/policy/cpuclass.go Outdated
Comment thread cmd/plugins/balloons/policy/cpuclass.go Outdated
Comment thread cmd/plugins/balloons/policy/balloons-policy.go
Comment thread pkg/apis/config/v1alpha1/balloons-policy.go Outdated
@askervin askervin force-pushed the 5h1-balloons-cpuclass branch from ce383ef to 210c009 Compare May 27, 2026 08:34
@askervin askervin marked this pull request as ready for review May 27, 2026 09:27
@askervin askervin marked this pull request as draft May 29, 2026 06:06
@askervin

Copy link
Copy Markdown
Collaborator Author

Back to draft.

Adding PCT core adjustments cleanly changes cpuClasses implementation drasticly.

  • Having cpufreq and goresctrl/sst backends requires cleaner separation of concerns compared to what is in this patch.
  • Using existing cpu controller for backwards compatibility and as an internal backend to cpuClasses is questionable. We do not need to tweak controller interface if cpuClasses uses its own backends to cpufreq/cpuidle/sst.
  • Current cpuClasses interface is not policy-agnostic. For instance, the concept of idle CPUs is balloons specific, and it can be avoided.
  • Current cpuClasses interface does not provide hints on to-be-allocated CPUs where intended CPU tuning would be possible to do. For instance, a punit inside a package might have run out of PCT cores, and therefore highest turbo frequencies would be only available on cores of another punit in the same package. Hint interface should be such that it could be used as is in the topology-aware policy, too.

@askervin askervin force-pushed the 5h1-balloons-cpuclass branch 2 times, most recently from 27e3d58 to b4aacd7 Compare June 5, 2026 12:11
@askervin askervin marked this pull request as ready for review June 5, 2026 12:23
@askervin askervin changed the title balloons: add cpuClasses balloons: add cpuClasses with cpufreq and PCT turbo allocation Jun 5, 2026
@askervin askervin force-pushed the 5h1-balloons-cpuclass branch from b4aacd7 to f355797 Compare June 22, 2026 07:46
@askervin

Copy link
Copy Markdown
Collaborator Author

Fixing lint issues requires updating sysfs and cpuallocator to use new SST API from goresctrl. That turned out to be non-trivial in allocator's BF-based CPU prioritization. Perhaps it'll be best to take goresctrl v0.13.0 update as a separate PR with new API use in it (to keep the lint happy)...

Comment thread pkg/resmgr/cpuclass/internal/cpufreq/cpufreq.go
Comment thread pkg/resmgr/cpuclass/internal/cpufreq/sysfs.go
askervin added 8 commits June 24, 2026 10:46
Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
@askervin askervin force-pushed the 5h1-balloons-cpuclass branch from f355797 to be0cb49 Compare June 24, 2026 08:05
Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
@askervin askervin force-pushed the 5h1-balloons-cpuclass branch 3 times, most recently from 108b3f7 to 53ecb1e Compare June 24, 2026 12:32
Comment thread pkg/agent/node-extended-resources.go
Comment thread pkg/resmgr/cpuclass/internal/cpuidle/cpuidle.go
Comment thread pkg/resmgr/cpuclass/internal/cpuidle/cpuidle.go
@askervin askervin force-pushed the 5h1-balloons-cpuclass branch from 53ecb1e to 299965b Compare June 25, 2026 07:33
Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
@askervin askervin force-pushed the 5h1-balloons-cpuclass branch from 299965b to 598df03 Compare June 25, 2026 08:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants