Skip to content

openhcl/tdx: inject machine check on ept exits caused by bad guest addresses #1340

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 28, 2025

Conversation

chris-oo
Copy link
Member

@chris-oo chris-oo commented May 12, 2025

On TDX, we see some cases where the guest attempts to access an address with an incorrect shared bit. It's unclear if this is an issue in OpenHCL, the guest, or the host, but fix OpenHCL crashing with an emulation failure due to a GuestMemory access failure, and instead inject a machine check into the guest.

For addresses outside of mmio and ram, continue to emulate but log that the guest did something strange. In the future, we may also inject a machine check on that path.

Tested via a uefi app that attempts to access a shared page at a private gpa (see https://github.com/chris-oo/openvmm/blob/uefi-tmk-write-to-shared/tmk/tmk_launch/src/main.rs) with the following crash:

[kmsg]: [3.058450] virt_mshv_vtl::processor::tdx: WARN  guest accessed inaccessible gpa, injecting MC gpa=0x666d9000 is_shared=false
[kmsg]: [3.061137] virt_mshv_vtl::processor: WARN  Guest has reported system crash crash=VtlCrash { vp_index: VpIndex(0), last_vtl: Vtl0, control: GuestCrashCtl { pre_os_id: 0, no_crash_dump: false, crash_message: true, crash_notify: true }, parameters: [12, 0, 0, 6790a820, 4d7] }
[kmsg]: [3.061758] virt_mshv_vtl::processor:
WARN  Guest has reported a system crash message "!!!! X64 Exception Type - 12(#MC - Machine-Check)  CPU Apic ID - 00000000 !!!!<5c>r<5c>nRIP  - 00000000666DB030, CS  - 0000000000000038, RFLAGS - 0000000000010202<5c>r<5c>nRAX  - 00000000666D9000, RCX - 0000000000000042, RDX - 3333333333333333<5c>r<5c>nRBX  - 0000000066E00018, RSP - 0000000033D99150, RBP - 0000000033D99190<5c>r<5c>nRSI  - 0000000066E54718, RDI - 0000000033DB8160<5c>r<5c>nR8   - 0000000000000000, R9  - 00000000666ED508, R10 - 00000000666ED5EE<5c>r<5c>nR11  - 000000000000000A, R12 - 0000000000000000, R13 - 0000000000000000<5c>r<5c>nR14  - 0000000033D99AA0, R15 - 0000000033D99A98<5c>r<5c>nDS   - 0000000000000030, ES  - 0000000000000030, FS  - 0000000000000030<5c>r<5c>nGS   - 0000000000000030, SS  - 0000000000000030<5c>r<5c>nCR0  - 0000000080010073, CR2 - 0000000000000000, CR3 - 0000000033801000<5c>r<5c>nCR4  - 0000000000000668, CR8 - 0000000000000000<5c>r<5c>nDR0  - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000<5c>r<5c>nDR3  -     

#1426 tracks implementing this for SNP.

@chris-oo chris-oo requested a review from a team as a code owner May 12, 2025 19:05
@chris-oo chris-oo marked this pull request as draft May 15, 2025 18:23
@chris-oo chris-oo force-pushed the inject-mc-tdx branch 4 times, most recently from 57c5608 to d2d59f6 Compare May 28, 2025 15:28
@chris-oo chris-oo added the backport_2505 Change should be backported to the release/2505 branch label May 28, 2025
@chris-oo chris-oo marked this pull request as ready for review May 28, 2025 15:56
@@ -1370,6 +1371,11 @@ impl UhProcessor<'_, SnpBacked> {
}

SevExitCode::NPF if has_intercept => {
// TODO SNP: This code needs to be fixed to not rely on the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure? I thought Alex said this was necessary. Doesn't the legacy HCL do this, too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talked with Jon offline and he said we shouldn't be relying on the hypervisor for anything - the hardware exit info on both SNP and TDX should have what we need. I haven't looked at the legacy HCL in detail but he said that's what he believes to be the case after some refactoring.

smalis-msft
smalis-msft previously approved these changes May 28, 2025
@@ -1014,6 +1041,11 @@ fn gpf_event() -> hvdef::HvX64PendingEvent {
make_exception_event(Exception::GENERAL_PROTECTION_FAULT, Some(0), None)
}

/// Generates a machine check pending event
fn mc_event() -> hvdef::HvX64PendingEvent {
make_exception_event(Exception::MACHINE_CHECK, None, None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to make sure cr4.MCE is 1?

Also, are there any machine-check MSRs the guest is going to read that we need to populate so that the guest crash dump is useful?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC in TDX MCE is always set to 1, although i think we do shadow the guest disabling it, we don't actaully allow the guest to disable the hardware TD state of MCE

@chris-oo chris-oo requested a review from smalis-msft May 28, 2025 22:21
@chris-oo chris-oo enabled auto-merge (squash) May 28, 2025 22:39
@chris-oo chris-oo merged commit 5c44faa into microsoft:main May 28, 2025
28 checks passed
@chris-oo chris-oo deleted the inject-mc-tdx branch May 28, 2025 23:10
chris-oo added a commit to chris-oo/openvmm that referenced this pull request May 28, 2025
…dresses (microsoft#1340)

On TDX, we see some cases where the guest attempts to access an address
with an incorrect shared bit. It's unclear if this is an issue in
OpenHCL, the guest, or the host, but fix OpenHCL crashing with an
emulation failure due to a `GuestMemory` access failure, and instead
inject a machine check into the guest.

For addresses outside of mmio and ram, continue to emulate but log that
the guest did something strange. In the future, we may also inject a
machine check on that path.

Tested via a uefi app that attempts to access a shared page at a private
gpa (see
https://github.com/chris-oo/openvmm/blob/uefi-tmk-write-to-shared/tmk/tmk_launch/src/main.rs)
with the following crash:

```
[kmsg]: [3.058450] virt_mshv_vtl::processor::tdx: WARN  guest accessed inaccessible gpa, injecting MC gpa=0x666d9000 is_shared=false
[kmsg]: [3.061137] virt_mshv_vtl::processor: WARN  Guest has reported system crash crash=VtlCrash { vp_index: VpIndex(0), last_vtl: Vtl0, control: GuestCrashCtl { pre_os_id: 0, no_crash_dump: false, crash_message: true, crash_notify: true }, parameters: [12, 0, 0, 6790a820, 4d7] }
[kmsg]: [3.061758] virt_mshv_vtl::processor:
WARN  Guest has reported a system crash message "!!!! X64 Exception Type - 12(#MC - Machine-Check)  CPU Apic ID - 00000000 !!!!<5c>r<5c>nRIP  - 00000000666DB030, CS  - 0000000000000038, RFLAGS - 0000000000010202<5c>r<5c>nRAX  - 00000000666D9000, RCX - 0000000000000042, RDX - 3333333333333333<5c>r<5c>nRBX  - 0000000066E00018, RSP - 0000000033D99150, RBP - 0000000033D99190<5c>r<5c>nRSI  - 0000000066E54718, RDI - 0000000033DB8160<5c>r<5c>nR8   - 0000000000000000, R9  - 00000000666ED508, R10 - 00000000666ED5EE<5c>r<5c>nR11  - 000000000000000A, R12 - 0000000000000000, R13 - 0000000000000000<5c>r<5c>nR14  - 0000000033D99AA0, R15 - 0000000033D99A98<5c>r<5c>nDS   - 0000000000000030, ES  - 0000000000000030, FS  - 0000000000000030<5c>r<5c>nGS   - 0000000000000030, SS  - 0000000000000030<5c>r<5c>nCR0  - 0000000080010073, CR2 - 0000000000000000, CR3 - 0000000033801000<5c>r<5c>nCR4  - 0000000000000668, CR8 - 0000000000000000<5c>r<5c>nDR0  - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000<5c>r<5c>nDR3  -     
```

microsoft#1426 tracks implementing this for SNP.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport_2505 Change should be backported to the release/2505 branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants