-
Notifications
You must be signed in to change notification settings - Fork 123
openhcl/tdx: inject machine check on ept exits caused by bad guest addresses #1340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
57c5608
to
d2d59f6
Compare
@@ -1370,6 +1371,11 @@ impl UhProcessor<'_, SnpBacked> { | |||
} | |||
|
|||
SevExitCode::NPF if has_intercept => { | |||
// TODO SNP: This code needs to be fixed to not rely on the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure? I thought Alex said this was necessary. Doesn't the legacy HCL do this, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I talked with Jon offline and he said we shouldn't be relying on the hypervisor for anything - the hardware exit info on both SNP and TDX should have what we need. I haven't looked at the legacy HCL in detail but he said that's what he believes to be the case after some refactoring.
@@ -1014,6 +1041,11 @@ fn gpf_event() -> hvdef::HvX64PendingEvent { | |||
make_exception_event(Exception::GENERAL_PROTECTION_FAULT, Some(0), None) | |||
} | |||
|
|||
/// Generates a machine check pending event | |||
fn mc_event() -> hvdef::HvX64PendingEvent { | |||
make_exception_event(Exception::MACHINE_CHECK, None, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to make sure cr4.MCE is 1?
Also, are there any machine-check MSRs the guest is going to read that we need to populate so that the guest crash dump is useful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC in TDX MCE is always set to 1, although i think we do shadow the guest disabling it, we don't actaully allow the guest to disable the hardware TD state of MCE
…dresses (microsoft#1340) On TDX, we see some cases where the guest attempts to access an address with an incorrect shared bit. It's unclear if this is an issue in OpenHCL, the guest, or the host, but fix OpenHCL crashing with an emulation failure due to a `GuestMemory` access failure, and instead inject a machine check into the guest. For addresses outside of mmio and ram, continue to emulate but log that the guest did something strange. In the future, we may also inject a machine check on that path. Tested via a uefi app that attempts to access a shared page at a private gpa (see https://github.com/chris-oo/openvmm/blob/uefi-tmk-write-to-shared/tmk/tmk_launch/src/main.rs) with the following crash: ``` [kmsg]: [3.058450] virt_mshv_vtl::processor::tdx: WARN guest accessed inaccessible gpa, injecting MC gpa=0x666d9000 is_shared=false [kmsg]: [3.061137] virt_mshv_vtl::processor: WARN Guest has reported system crash crash=VtlCrash { vp_index: VpIndex(0), last_vtl: Vtl0, control: GuestCrashCtl { pre_os_id: 0, no_crash_dump: false, crash_message: true, crash_notify: true }, parameters: [12, 0, 0, 6790a820, 4d7] } [kmsg]: [3.061758] virt_mshv_vtl::processor: WARN Guest has reported a system crash message "!!!! X64 Exception Type - 12(#MC - Machine-Check) CPU Apic ID - 00000000 !!!!<5c>r<5c>nRIP - 00000000666DB030, CS - 0000000000000038, RFLAGS - 0000000000010202<5c>r<5c>nRAX - 00000000666D9000, RCX - 0000000000000042, RDX - 3333333333333333<5c>r<5c>nRBX - 0000000066E00018, RSP - 0000000033D99150, RBP - 0000000033D99190<5c>r<5c>nRSI - 0000000066E54718, RDI - 0000000033DB8160<5c>r<5c>nR8 - 0000000000000000, R9 - 00000000666ED508, R10 - 00000000666ED5EE<5c>r<5c>nR11 - 000000000000000A, R12 - 0000000000000000, R13 - 0000000000000000<5c>r<5c>nR14 - 0000000033D99AA0, R15 - 0000000033D99A98<5c>r<5c>nDS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030<5c>r<5c>nGS - 0000000000000030, SS - 0000000000000030<5c>r<5c>nCR0 - 0000000080010073, CR2 - 0000000000000000, CR3 - 0000000033801000<5c>r<5c>nCR4 - 0000000000000668, CR8 - 0000000000000000<5c>r<5c>nDR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000<5c>r<5c>nDR3 - ``` microsoft#1426 tracks implementing this for SNP.
On TDX, we see some cases where the guest attempts to access an address with an incorrect shared bit. It's unclear if this is an issue in OpenHCL, the guest, or the host, but fix OpenHCL crashing with an emulation failure due to a
GuestMemory
access failure, and instead inject a machine check into the guest.For addresses outside of mmio and ram, continue to emulate but log that the guest did something strange. In the future, we may also inject a machine check on that path.
Tested via a uefi app that attempts to access a shared page at a private gpa (see https://github.com/chris-oo/openvmm/blob/uefi-tmk-write-to-shared/tmk/tmk_launch/src/main.rs) with the following crash:
#1426 tracks implementing this for SNP.