Skip to content

Commit ff48e0e

Browse files
committed
fix: Move KVM_KVMCLOCK_CTRL from after pause to before resume
KVM_KVMCLOCK_CTRL ioctl sets `pvclock_set_guest_stopped_request` flag of `kvm_vcpu_arch` [1]. On the next guest time update, if the flag is set, KVM ORs in `PVCLOCK_GUEST_STOPPED` and `kvm_setup_guest_pvclock()` pushes the `hv_clock` into the guest's pvclock page [2]. If the `hv_clock` has not been written to the guest's pvclock page when taking a snapshot, it is not saved in the snapshot memory (i.e. `PVCLOCK_GUEST_STOPPED` isn't set in resumed VMs). So we should call KVM_KVMCLOCK_CTRL ioctl before resuming a VM rather than after pausing a VM. That covers both the pause-and-resume case and the restore-and-resume case. [1]: https://elixir.bootlin.com/linux/v6.16.3/source/arch/x86/kvm/x86.c#L5734 [2]: https://elixir.bootlin.com/linux/v6.16.3/source/arch/x86/kvm/x86.c#L3286-L3295 Signed-off-by: Takahiro Itazuri <[email protected]>
1 parent cc20162 commit ff48e0e

File tree

2 files changed

+15
-24
lines changed

2 files changed

+15
-24
lines changed

CHANGELOG.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,11 @@ and this project adheres to
4343
bug causing a read/write from an iovec to be duplicated when receiving an
4444
error on an iovec other than the first. This caused a data corruption issue in
4545
the vsock device starting from guest kernel 6.17.
46+
- [#5494](https://github.com/firecracker-microvm/firecracker/pull/5494): Fixed
47+
a bug where soft lockups were detected by watchdog on microVMs restored from
48+
snapshots. This moved the timing of KVM_KVMCLOCK_CTRL ioctl call, which
49+
notifies watchdog that the microVMs were paused, from after pause to before
50+
resume.
4651

4752
## [1.13.0]
4853

src/vmm/src/vstate/vcpu.rs

Lines changed: 10 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -233,19 +233,7 @@ impl Vcpu {
233233
Ok(VcpuEmulation::Stopped) => return self.exit(FcExitCode::Ok),
234234
// If the emulation requests a pause lets do this
235235
#[cfg(feature = "gdb")]
236-
Ok(VcpuEmulation::Paused) => {
237-
// Calling `KVM_KVMCLOCK_CTRL` to make sure the guest softlockup watchdog
238-
// does not panic on resume, see https://docs.kernel.org/virt/kvm/api.html .
239-
// We do not want to fail if the call is not successful, because depending
240-
// that may be acceptable depending on the workload.
241-
#[cfg(target_arch = "x86_64")]
242-
if let Err(err) = self.kvm_vcpu.fd.kvmclock_ctrl() {
243-
METRICS.vcpu.kvmclock_ctrl_fails.inc();
244-
warn!("KVM_KVMCLOCK_CTRL call failed {}", err);
245-
}
246-
247-
return StateMachine::next(Self::paused);
248-
}
236+
Ok(VcpuEmulation::Paused) => return StateMachine::next(Self::paused),
249237
// Emulation errors lead to vCPU exit.
250238
Err(_) => return self.exit(FcExitCode::GenericError),
251239
}
@@ -263,16 +251,6 @@ impl Vcpu {
263251
.send(VcpuResponse::Paused)
264252
.expect("vcpu channel unexpectedly closed");
265253

266-
// Calling `KVM_KVMCLOCK_CTRL` to make sure the guest softlockup watchdog
267-
// does not panic on resume, see https://docs.kernel.org/virt/kvm/api.html .
268-
// We do not want to fail if the call is not successful, because depending
269-
// that may be acceptable depending on the workload.
270-
#[cfg(target_arch = "x86_64")]
271-
if let Err(err) = self.kvm_vcpu.fd.kvmclock_ctrl() {
272-
METRICS.vcpu.kvmclock_ctrl_fails.inc();
273-
warn!("KVM_KVMCLOCK_CTRL call failed {}", err);
274-
}
275-
276254
// Move to 'paused' state.
277255
state = StateMachine::next(Self::paused);
278256
}
@@ -322,7 +300,15 @@ impl Vcpu {
322300
);
323301
self.kvm_vcpu.fd.set_kvm_immediate_exit(0);
324302
}
325-
// Nothing special to do.
303+
// Calling `KVM_KVMCLOCK_CTRL` to make sure the guest softlockup watchdog
304+
// does not panic on resume, see https://docs.kernel.org/virt/kvm/api.html .
305+
// We do not want to fail if the call is not successful, because depending
306+
// that may be acceptable depending on the workload.
307+
#[cfg(target_arch = "x86_64")]
308+
if let Err(err) = self.kvm_vcpu.fd.kvmclock_ctrl() {
309+
METRICS.vcpu.kvmclock_ctrl_fails.inc();
310+
warn!("KVM_KVMCLOCK_CTRL call failed {}", err);
311+
}
326312
self.response_sender
327313
.send(VcpuResponse::Resumed)
328314
.expect("vcpu channel unexpectedly closed");

0 commit comments

Comments
 (0)