Skip to content

✨ Implement compute config reconciliation for VirtualMachine#1644

Open
hpannem wants to merge 1 commit into
vmware-tanzu:mainfrom
hpannem:compute-config-reconcile
Open

✨ Implement compute config reconciliation for VirtualMachine#1644
hpannem wants to merge 1 commit into
vmware-tanzu:mainfrom
hpannem:compute-config-reconcile

Conversation

@hpannem

@hpannem hpannem commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

✨ Implement compute config reconciliation for VirtualMachine spec

What does this PR do, and why is it needed?

This PR introduces a unified compute configuration reconciliation path for
VirtualMachine objects. Previously, compute fields (CPU topology, resource
allocation, CPU/memory flags, latency sensitivity) were either not managed or
applied inconsistently across powered-on and powered-off reconfigure paths.

Key changes:

  • OverwriteSpecComputeConfig — new core function using a computeFieldDef
    function-table pattern. Each field carries its own minHWVer, hotPluggable,
    differs, and apply callbacks. A 3-way comparison (spec vs. live vSphere
    state vs. class-derived ConfigSpec) determines whether to emit a change.
    hwVer is derived internally from liveCI.Version rather than passed as a
    parameter.

  • Spec semantics — VM Operator fully owns every compute field it exposes.
    nil means "reset to platform default." The only exception is
    resources.size.cpu/memory where nil defers to the VirtualMachineClass.
    0 is a sentinel for coresPerSocket / vnumaNodeCount meaning "auto."
    -1 is a sentinel for limits.cpu / limits.memory meaning "unlimited."

  • API validation (api/v1alpha6, webhooks) —

    • coresPerSocket and vnumaNodeCount minimum changed to 0 (allow "auto"
      sentinel).
    • XValidation rule tightened: vnumaNodeCount > 0 requires
      coresPerSocket > 0 (an explicit non-zero value).
    • -1 allowed for limits.cpu and limits.memory as the "unlimited"
      sentinel; 0 is still rejected for limits.
  • SyncClassComputeToSpec / SyncClassSizeAndAllocationToSpec — new
    functions that copy the class's compute fields into vm.Spec when a
    class-based resize is triggered (ResizeNeeded returns true). This restores
    backward compatibility: during a resize the class is authoritative and
    overrides all spec fields (backfilled or user-specified) that it defines.
    The sync is performed in the reconciler — not the mutating webhook — because
    the VirtualMachineClass may not exist yet when the webhook fires.

    • Full class resize (VMResize feature): SyncClassComputeToSpec syncs all
      compute fields, and device changes are applied atomically in the same
      vSphere reconfigure call. The subsequent reconcile is a no-op.
    • CPU/memory-only resize (VMResizeCPUMemory feature):
      SyncClassSizeAndAllocationToSpec syncs only size and allocation fields,
      matching the narrow scope of that path.

Which issue(s) is/are addressed by this PR? (optional):

Fixes #

Are there any special notes for your reviewer:

  • All new sync calls are gated by TelcoVMServiceAPI feature flag since
    OverwriteSpecComputeConfig only runs when that flag is enabled.

Please add a release note if necessary:

VirtualMachine compute fields (CPU topology, resource allocation, CPU/memory
flags, latency sensitivity) are now fully reconciled via `vm.Spec`. Setting a
field to `nil` resets it to the platform default. `0` is an "auto" sentinel for
`coresPerSocket`/`vnumaNodeCount`; `-1` is an "unlimited" sentinel for
`limits.cpu`/`limits.memory`. Class-based resize continues to be authoritative:
when a resize is triggered the class overrides all affected spec fields.

<!-- readthedocs-preview vm-operator start -->
----
📚 Documentation preview 📚: https://vm-operator--1644.org.readthedocs.build/en/1644/

<!-- readthedocs-preview vm-operator end -->

@hpannem hpannem self-assigned this Jun 9, 2026
@hpannem hpannem requested review from a team and faisalabujabal as code owners June 9, 2026 15:57
@github-actions github-actions Bot added the size/XXL Denotes a PR that changes 1000+ lines. label Jun 9, 2026
@hpannem hpannem requested review from akutz and bryanv June 9, 2026 15:59
remove vmx-23 fields
removed ExposeVNUMAOnCPUHotAdd
backward compatibility class resize
status population
guaranteed <==> besteffort class resize
@hpannem hpannem force-pushed the compute-config-reconcile branch from a11ca63 to ae95378 Compare June 16, 2026 00:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XXL Denotes a PR that changes 1000+ lines.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant