Skip to content

add CPU platforms to instances #8728

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 23 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
4130d30
make sleds report their CPU families to Nexus
gjcolombo Apr 15, 2025
d316a2e
differentiate Turin and Turin Dense for the control plane
iximeow Jul 30, 2025
1367289
unwind CPU families from the public sled API
iximeow Jul 30, 2025
b5eaf68
review notes
iximeow Jul 30, 2025
114f383
fix links ugh
iximeow Jul 30, 2025
4c40d47
migration still needs to know about turin dense
iximeow Jul 30, 2025
5ec45d3
sled-agent needs to expose cpu_family for inventory collections too
iximeow Aug 2, 2025
e9cbbdd
it compiles (might work now?)
iximeow Aug 2, 2025
bf7ccae
migrations need to be... right ...
iximeow Aug 2, 2025
0a79d5e
and that's the missing update of cpu_family.
iximeow Aug 2, 2025
ea59a26
non-illumos has to build too ofc
iximeow Aug 2, 2025
10cd335
fix expectorated output and, oh, docs are in the openapi spec
iximeow Aug 3, 2025
99a37f0
cleanup
iximeow Aug 6, 2025
6846a4a
move SledCpuFamily to a more fitting place
iximeow Aug 6, 2025
5f94661
rustfmt AGH
iximeow Aug 6, 2025
543bdc9
and expectorate up the reconfigurator output
iximeow Aug 6, 2025
34516b4
instance minimum CPU platforms
gjcolombo Apr 21, 2025
3831038
walk back "minimum"ness of CPU platforms
iximeow Jul 28, 2025
33956f9
i want propolis logs too please thank you
iximeow Jul 31, 2025
9b6b6ec
one more pass at aligning RFD 314, what we currently expose, and the …
iximeow Aug 6, 2025
5cf7b9c
and map the CPU platform "Turin" to all Turin sled CPU types
iximeow Aug 6, 2025
a877f39
one use of SledCpuFamily i missed in the rebase
iximeow Aug 7, 2025
1d34ab2
revert the buildomat log collection changes
iximeow Aug 7, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

49 changes: 49 additions & 0 deletions common/src/api/external/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1194,6 +1194,10 @@ pub struct Instance {

#[serde(flatten)]
pub auto_restart_status: InstanceAutoRestartStatus,

/// The CPU platform for this instance. If this is `null`, the instance
/// requires no particular CPU platform.
pub cpu_platform: Option<InstanceCpuPlatform>,
}

/// Status of control-plane driven automatic failure recovery for this instance.
Expand Down Expand Up @@ -1258,6 +1262,51 @@ pub enum InstanceAutoRestartPolicy {
BestEffort,
}

/// A required CPU platform for an instance.
///
/// When an instance specifies a required CPU platform:
///
/// - The system may expose (to the VM) new CPU features that are only present
/// on that platform (or on newer platforms of the same lineage that also
/// support those features).
/// - The instance must run on hosts that have CPUs that support all the
/// features of the supplied platform.
///
/// That is, the instance is restricted to hosts that have the CPUs which
/// support all features of the required platform, but in exchange the CPU
/// features exposed by the platform are available for the guest to use. Note
/// that this may prevent an instance from starting (if the hosts that could run
/// it are full but there is capacity on other incompatible hosts).
///
/// If an instance does not specify a required CPU platform, then when
/// it starts, the control plane selects a host for the instance and then
/// supplies the guest with the "minimum" CPU platform supported by that host.
/// This maximizes the number of hosts that can run the VM if it later needs to
/// migrate to another host.
///
/// In all cases, the CPU features presented by a given CPU platform are a
/// subset of what the corresponding hardware may actually support; features
/// which cannot be used from a virtual environment or do not have full
/// hypervisor support may be masked off. See RFD 314 for specific CPU features
/// in a CPU platform.
#[derive(
Copy, Clone, Debug, Deserialize, Serialize, JsonSchema, Eq, PartialEq,
)]
#[serde(rename_all = "snake_case")]
pub enum InstanceCpuPlatform {
/// An AMD Milan-like CPU platform.
AmdMilan,

/// An AMD Turin-like CPU platform.
// Note that there is only Turin, not Turin Dense - feature-wise there are
// collapsed together as the guest-visible platform is the same.
// If the two must be distinguished for instance placement, we'll want to
// track whatever the motivating constraint is more explicitly. CPU
// families, and especially the vendor code names, don't necessarily promise
// details about specific processor packaging choices.
AmdTurin,
}

// AFFINITY GROUPS

/// Affinity policy used to describe "what to do when a request cannot be satisfied"
Expand Down
6 changes: 6 additions & 0 deletions dev-tools/omdb/src/bin/omdb/db.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4760,6 +4760,7 @@ async fn cmd_db_instance_info(
propolis_ip: _,
propolis_port: _,
instance_id: _,
cpu_platform: _,
time_created,
time_deleted,
runtime:
Expand Down Expand Up @@ -7356,6 +7357,7 @@ fn prettyprint_vmm(
const INSTANCE_ID: &'static str = "instance ID";
const SLED_ID: &'static str = "sled ID";
const SLED_SERIAL: &'static str = "sled serial";
const CPU_PLATFORM: &'static str = "CPU platform";
const ADDRESS: &'static str = "propolis address";
const STATE: &'static str = "state";
const WIDTH: usize = const_max_len(&[
Expand All @@ -7366,6 +7368,7 @@ fn prettyprint_vmm(
INSTANCE_ID,
SLED_ID,
SLED_SERIAL,
CPU_PLATFORM,
STATE,
ADDRESS,
]);
Expand All @@ -7379,6 +7382,7 @@ fn prettyprint_vmm(
sled_id,
propolis_ip,
propolis_port,
cpu_platform,
runtime: db::model::VmmRuntimeState { state, r#gen, time_state_updated },
} = vmm;

Expand All @@ -7405,6 +7409,7 @@ fn prettyprint_vmm(
if let Some(serial) = sled_serial {
println!("{indent}{SLED_SERIAL:>width$}: {serial}");
}
println!("{indent}{CPU_PLATFORM:>width$}: {cpu_platform}");
}

async fn cmd_db_vmm_list(
Expand Down Expand Up @@ -7480,6 +7485,7 @@ async fn cmd_db_vmm_list(
sled_id,
propolis_ip: _,
propolis_port: _,
cpu_platform: _,
runtime:
db::model::VmmRuntimeState {
state,
Expand Down
3 changes: 3 additions & 0 deletions dev-tools/reconfigurator-cli/tests/output/cmds-example-stdout
Original file line number Diff line number Diff line change
Expand Up @@ -1101,6 +1101,7 @@ sled 2eb69596-f081-4e2d-9425-9994926e0832 (role = Gimlet, serial serial1)
found at: <REDACTED_TIMESTAMP> from fake sled agent
address: [fd00:1122:3344:102::1]:12345
usable hw threads: 10
CPU family: amd_milan
usable memory (GiB): 0
reservoir (GiB): 0
physical disks:
Expand Down Expand Up @@ -1210,6 +1211,7 @@ sled 32d8d836-4d8a-4e54-8fa9-f31d79c42646 (role = Gimlet, serial serial2)
found at: <REDACTED_TIMESTAMP> from fake sled agent
address: [fd00:1122:3344:103::1]:12345
usable hw threads: 10
CPU family: amd_milan
usable memory (GiB): 0
reservoir (GiB): 0
physical disks:
Expand Down Expand Up @@ -1319,6 +1321,7 @@ sled 89d02b1b-478c-401a-8e28-7a26f74fa41b (role = Gimlet, serial serial0)
found at: <REDACTED_TIMESTAMP> from fake sled agent
address: [fd00:1122:3344:101::1]:12345
usable hw threads: 10
CPU family: amd_milan
usable memory (GiB): 0
reservoir (GiB): 0
physical disks:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@ sled 2b8f0cb3-0295-4b3c-bc58-4fe88b57112c (role = Gimlet, serial serial1)
found at: <REDACTED_TIMESTAMP> from fake sled agent
address: [fd00:1122:3344:102::1]:12345
usable hw threads: 10
CPU family: amd_milan
usable memory (GiB): 0
reservoir (GiB): 0
physical disks:
Expand Down Expand Up @@ -194,6 +195,7 @@ sled 98e6b7c2-2efa-41ca-b20a-0a4d61102fe6 (role = Gimlet, serial serial0)
found at: <REDACTED_TIMESTAMP> from fake sled agent
address: [fd00:1122:3344:101::1]:12345
usable hw threads: 10
CPU family: amd_milan
usable memory (GiB): 0
reservoir (GiB): 0
physical disks:
Expand Down Expand Up @@ -302,6 +304,7 @@ sled d81c6a84-79b8-4958-ae41-ea46c9b19763 (role = Gimlet, serial serial2)
found at: <REDACTED_TIMESTAMP> from fake sled agent
address: [fd00:1122:3344:103::1]:12345
usable hw threads: 10
CPU family: amd_milan
usable memory (GiB): 0
reservoir (GiB): 0
physical disks:
Expand Down
1 change: 1 addition & 0 deletions end-to-end-tests/src/instance_launch.rs
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,7 @@ async fn instance_launch() -> Result<()> {
start: true,
auto_restart_policy: Default::default(),
anti_affinity_groups: Vec::new(),
cpu_platform: None,
})
.send()
.await?;
Expand Down
5 changes: 3 additions & 2 deletions nexus-sled-agent-shared/src/inventory.rs
Original file line number Diff line number Diff line change
Expand Up @@ -40,9 +40,9 @@ use omicron_uuid_kinds::{SledUuid, ZpoolUuid};
use schemars::schema::{Schema, SchemaObject};
use schemars::{JsonSchema, SchemaGenerator};
use serde::{Deserialize, Serialize};
// Export this type for convenience -- this way, dependents don't have to
// Export these types for convenience -- this way, dependents don't have to
// depend on sled-hardware-types.
pub use sled_hardware_types::Baseboard;
pub use sled_hardware_types::{Baseboard, SledCpuFamily};
use strum::EnumIter;
use tufaceous_artifact::{ArtifactHash, KnownArtifactKind};

Expand Down Expand Up @@ -121,6 +121,7 @@ pub struct Inventory {
pub baseboard: Baseboard,
pub usable_hardware_threads: u32,
pub usable_physical_ram: ByteCount,
pub cpu_family: SledCpuFamily,
pub reservoir_size: ByteCount,
pub disks: Vec<InventoryDisk>,
pub zpools: Vec<InventoryZpool>,
Expand Down
8 changes: 7 additions & 1 deletion nexus/db-model/src/instance.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
use super::InstanceIntendedState as IntendedState;
use super::{
ByteCount, Disk, ExternalIp, Generation, InstanceAutoRestartPolicy,
InstanceCpuCount, InstanceState, Vmm, VmmState,
InstanceCpuCount, InstanceCpuPlatform, InstanceState, Vmm, VmmState,
};
use crate::collection::DatastoreAttachTargetConfig;
use crate::serde_time_delta::optional_time_delta;
Expand Down Expand Up @@ -68,6 +68,9 @@ pub struct Instance {
#[diesel(column_name = boot_disk_id)]
pub boot_disk_id: Option<Uuid>,

/// The instance's required CPU platform.
pub cpu_platform: Option<InstanceCpuPlatform>,

#[diesel(embed)]
pub runtime_state: InstanceRuntimeState,

Expand Down Expand Up @@ -139,6 +142,7 @@ impl Instance {
// Intentionally ignore `params.boot_disk_id` here: we can't set
// `boot_disk_id` until the referenced disk is attached.
boot_disk_id: None,
cpu_platform: params.cpu_platform.map(Into::into),

runtime_state,
intended_state,
Expand Down Expand Up @@ -493,4 +497,6 @@ pub struct InstanceUpdate {
pub ncpus: InstanceCpuCount,

pub memory: ByteCount,

pub cpu_platform: Option<InstanceCpuPlatform>,
}
65 changes: 65 additions & 0 deletions nexus/db-model/src/instance_cpu_platform.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
// This Source Code Form is subject to the terms of the Mozilla Public
// License, v. 2.0. If a copy of the MPL was not distributed with this
// file, You can obtain one at https://mozilla.org/MPL/2.0/.

use crate::SledCpuFamily;

use super::impl_enum_type;
use serde::{Deserialize, Serialize};

impl_enum_type!(
InstanceCpuPlatformEnum:

#[derive(
Copy,
Clone,
Debug,
PartialEq,
AsExpression,
FromSqlRow,
Serialize,
Deserialize
)]
pub enum InstanceCpuPlatform;

AmdMilan => b"amd_milan"
AmdTurin => b"amd_turin"
);

impl InstanceCpuPlatform {
/// Returns a slice containing the set of sled CPU families that can
/// accommodate an instance with this CPU platform.
pub fn compatible_sled_cpu_families(&self) -> &'static [SledCpuFamily] {
match self {
// Turin-based sleds have a superset of the features made available
// in a guest's Milan CPU platform
Self::AmdMilan => {
&[SledCpuFamily::AmdMilan, SledCpuFamily::AmdTurin]
}
Self::AmdTurin => &[SledCpuFamily::AmdTurin],
}
}
}

impl From<omicron_common::api::external::InstanceCpuPlatform>
for InstanceCpuPlatform
{
fn from(value: omicron_common::api::external::InstanceCpuPlatform) -> Self {
use omicron_common::api::external::InstanceCpuPlatform as ApiPlatform;
match value {
ApiPlatform::AmdMilan => Self::AmdMilan,
ApiPlatform::AmdTurin => Self::AmdTurin,
}
}
}

impl From<InstanceCpuPlatform>
for omicron_common::api::external::InstanceCpuPlatform
{
fn from(value: InstanceCpuPlatform) -> Self {
match value {
InstanceCpuPlatform::AmdMilan => Self::AmdMilan,
InstanceCpuPlatform::AmdTurin => Self::AmdTurin,
}
}
}
3 changes: 3 additions & 0 deletions nexus/db-model/src/inventory.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ use crate::ArtifactHash;
use crate::Generation;
use crate::PhysicalDiskKind;
use crate::omicron_zone_config::{self, OmicronZoneNic};
use crate::sled_cpu_family::SledCpuFamily;
use crate::typed_uuid::DbTypedUuid;
use crate::{
ByteCount, MacAddr, Name, ServiceKind, SqlU8, SqlU16, SqlU32,
Expand Down Expand Up @@ -887,6 +888,7 @@ pub struct InvSledAgent {
pub sled_role: SledRole,
pub usable_hardware_threads: SqlU32,
pub usable_physical_ram: ByteCount,
pub cpu_family: SledCpuFamily,
pub reservoir_size: ByteCount,
// Soft foreign key to an `InvOmicronSledConfig`
pub ledgered_sled_config: Option<DbTypedUuid<OmicronSledConfigKind>>,
Expand Down Expand Up @@ -1300,6 +1302,7 @@ impl InvSledAgent {
usable_physical_ram: ByteCount::from(
sled_agent.usable_physical_ram,
),
cpu_family: sled_agent.cpu_family.into(),
reservoir_size: ByteCount::from(sled_agent.reservoir_size),
ledgered_sled_config: ledgered_sled_config.map(From::from),
reconciler_status,
Expand Down
6 changes: 6 additions & 0 deletions nexus/db-model/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ mod image;
mod instance;
mod instance_auto_restart_policy;
mod instance_cpu_count;
mod instance_cpu_platform;
mod instance_intended_state;
mod instance_state;
mod internet_gateway;
Expand Down Expand Up @@ -103,6 +104,7 @@ mod silo_group;
mod silo_user;
mod silo_user_password_hash;
mod sled;
mod sled_cpu_family;
mod sled_instance;
mod sled_policy;
mod sled_resource_vmm;
Expand All @@ -122,6 +124,7 @@ mod utilization;
mod virtual_provisioning_collection;
mod virtual_provisioning_resource;
mod vmm;
mod vmm_cpu_platform;
mod vni;
mod volume;
mod volume_repair;
Expand Down Expand Up @@ -179,6 +182,7 @@ pub use image::*;
pub use instance::*;
pub use instance_auto_restart_policy::*;
pub use instance_cpu_count::*;
pub use instance_cpu_platform::*;
pub use instance_intended_state::*;
pub use instance_state::*;
pub use internet_gateway::*;
Expand Down Expand Up @@ -223,6 +227,7 @@ pub use silo_group::*;
pub use silo_user::*;
pub use silo_user_password_hash::*;
pub use sled::*;
pub use sled_cpu_family::*;
pub use sled_instance::*;
pub use sled_policy::to_db_sled_policy; // Do not expose DbSledPolicy
pub use sled_resource_vmm::*;
Expand All @@ -246,6 +251,7 @@ pub use v2p_mapping::*;
pub use virtual_provisioning_collection::*;
pub use virtual_provisioning_resource::*;
pub use vmm::*;
pub use vmm_cpu_platform::*;
pub use vmm_state::*;
pub use vni::*;
pub use volume::*;
Expand Down
4 changes: 3 additions & 1 deletion nexus/db-model/src/schema_versions.rs
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ use std::{collections::BTreeMap, sync::LazyLock};
///
/// This must be updated when you change the database schema. Refer to
/// schema/crdb/README.adoc in the root of this repository for details.
pub const SCHEMA_VERSION: Version = Version::new(173, 0, 0);
pub const SCHEMA_VERSION: Version = Version::new(175, 0, 0);

/// List of all past database schema versions, in *reverse* order
///
Expand All @@ -28,6 +28,8 @@ static KNOWN_VERSIONS: LazyLock<Vec<KnownVersion>> = LazyLock::new(|| {
// | leaving the first copy as an example for the next person.
// v
// KnownVersion::new(next_int, "unique-dirname-with-the-sql-files"),
KnownVersion::new(175, "add-instance-cpu-platform"),
KnownVersion::new(174, "sled-cpu-family"),
KnownVersion::new(173, "inv-internal-dns"),
KnownVersion::new(172, "add-zones-with-mupdate-override"),
KnownVersion::new(171, "inv-clear-mupdate-override"),
Expand Down
Loading
Loading