Skip to content

Conversation

davidkleymann
Copy link

Summary of the PR

Add support for dirty page tracking via the Dirty ring interface. Adds KvmDirtyLogRing structure for keeping track of the indices and the base pointer to the shared memory buffer. Implements iterating over dirty pages, thereby harvesting them. Implements reset_dirty_rings on VmFd to trigger recycling of dirty ring buffer elements by the kernel after processing. Adds the dirty_log_ring field to VcpuFd.

This is a draft that needs some review and improvements, I'm hereby asking for suggestions for improving the following remaining weaknesses:

  1. mmap_from_fd will succeed with a size supplied that is smaller than the full size of the shared memory buffer.
  2. Send and Sync are probably not safe on KvmDirtyLogRing because accesses are not stateless, due to the state we need to keep track of in user space (next_dirty)
  3. If any vCPU is running while harvesting dirty pages, the user of KvmDirtyLogRing needs to call reset_dirty_rings before reading contents of dirty pages. This is currently the users responsibility, which allows for calling reset_dirty_rings after all dirty pages have been read by the VMM for cases where all vCPUs are halted, but this may be a confusing interface for other cases.
  4. KvmDirtyLogRing is currently pub, perhaps it can be hidden behind an interface on VcpuFd
  5. No unit tests

More info on the interface:
https://www.kernel.org/doc/html/latest/virt/kvm/api.html#kvm-cap-dirty-log-ring-kvm-cap-dirty-log-ring-acq-rel

Requirements

Before submitting your PR, please make sure you addressed the following
requirements:

  • All commits in this PR have Signed-Off-By trailers (with
    git commit -s), and the commit message has max 60 characters for the
    summary and max 75 characters for each description line.
  • All added/changed functionality has a corresponding unit/integration
    test.
  • All added/changed public-facing functionality has entries in the "Upcoming
    Release" section of CHANGELOG.md (if no such section exists, please create one).
  • Any newly added unsafe code is properly documented.

David Kleymann and others added 10 commits August 26, 2025 14:05
The capability is used for the KVM dirty ring interface for tracking
dirtied pages.

Signed-off-by: David Kleymann <[email protected]>
Adds the KVM_RESET_DIRTY_RINGS ioctl and the function reset_dirty_rings
in impl VmFd to wrap it.

Signed-off-by: David Kleymann <[email protected]>
Adds vCPU functions to mmap the dirty ring and iterate over dirty
pages. Also adds return value to reset_dirty_rings and flush_dirty_gfns.

More info: https://www.kernel.org/doc/html/latest/virt/kvm/api.html#kvm-cap-dirty-log-ring-kvm-cap-dirty-log-ring-acq-rel

Signed-off-by: David Kleymann <[email protected]>
… of map_dirty_log_ring

Signed-off-by: David Kleymann <[email protected]>
Comments on the safety on the operations used to mmap the shared memory
buffer of kvm_dirty_gfn entries.

Signed-off-by: David Kleymann <[email protected]>
Copy link
Member

@roypat roypat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please squash your commits so that we do not have later commits be fixups for previous commits.

@@ -2,6 +2,8 @@

## Upcoming Release

- Plumb through KVM_CAP_DIRTY_LOG_RING as DirtyLogRing cap.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should go into an ### Added section

Comment on lines +90 to +101
unsafe {
let gfn = self.gfns.add(i as usize).as_mut();
if (*gfn).flags & KVM_DIRTY_GFN_F_DIRTY == 0 {
// next_dirty stays the same, it will become the next dirty element
return None;
} else {
self.next_dirty += 1;
(*gfn).flags ^= KVM_DIRTY_GFN_F_RESET;
return Some(((*gfn).slot, (*gfn).offset));
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we should probably do just a single read_volatile of the struct kvm_dirty_gfn, to avoid data races, and write the updated kvm_dirty_gfn flags via write_volatile (or, on weakly ordered architectures such as arm64, an atomic read with acquire ordering, and write with release ordering. Which means we'll probably also want to support checking KVM_CAP_DIRTY_LOG_RING_ACQ_REL)

/// }
/// }
/// ```
pub fn dirty_log_ring_iter(&mut self) -> Option<&mut KvmDirtyLogRing> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can hide the actual iterator type here by just returning impl Iterator<Item = blablabla>. Then maybe the entire struct doesnt need to be exported?

@@ -1930,14 +1930,14 @@ impl VmFd {
/// }
/// ```
///
pub fn reset_dirty_rings(&self) -> Result<()> {
pub fn reset_dirty_rings(&self) -> Result<c_int> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should go into the previous commit

Comment on lines +1918 to +1919
/// Resets all vCPU's dirty log rings. This notifies the kernel that pages have been harvested
/// from the dirty ring and the corresponding pages can be reprotected.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should also just go into the commit that introduced the function

@@ -2106,8 +2106,7 @@ impl VcpuFd {
}
}

/// Maps the coalesced MMIO ring page. This allows reading entries from
/// the ring via [`coalesced_mmio_read()`](VcpuFd::coalesced_mmio_read).
/// Maps the KVM dirty log ring.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should go into the commit that introduced the function

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and all the changes to this file should go into the commit that introduced struct KvmDirtyLogRing

@@ -3,6 +3,7 @@
## Upcoming Release

- Plumb through KVM_CAP_DIRTY_LOG_RING as DirtyLogRing cap.
- Added support for dirty log ring interface introducing `VcpuFd::reset_dirty_rings`, `KvmDirtyLogRing`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just add the changelog entries in the commit that introduces the respective functions.

};

let offset = page_size * KVM_DIRTY_LOG_PAGE_OFFSET as usize;

if bytes % std::mem::size_of::<kvm_dirty_gfn>() != 0 {
// Size of dirty ring in bytes must be multiples of slot size
return Err(errno::Error::new(libc::EINVAL));
}
let slots = bytes / std::mem::size_of::<kvm_dirty_gfn>();
if slots & (slots - 1) != 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slots.is_power_of_two()

Comment on lines 2121 to 2136
/// # extern crate kvm_ioctls;
/// # use kvm_ioctls::{Cap, Kvm};
/// let kvm = Kvm::new().unwrap();
/// let vm = kvm.create_vm().unwrap();
/// let mut vcpu = vm.create_vcpu(0).unwrap();
/// if kvm.check_extension(Cap::DirtyLogRing) {
/// vcpu.coalesced_mmio_ring().unwrap();
/// }
/// ```
pub fn map_dirty_log_ring(&mut self, bytes: usize) -> Result<()> {
if self.dirty_log_ring.is_none() {
let ring = KvmDirtyLogRing::mmap_from_fd(&self.vcpu, bytes)?;
self.dirty_log_ring = Some(ring);
}
Ok(())
}
Copy link
Member

@roypat roypat Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the dirty log ring interface is enabled via KVM_ENABLE_CAP (which we expose as enable_cap() in this crate), which is where the size of the ring buffer is specified as well. So I would propose to make the API here a wrapper on top of enable_cap(), say enable_dirty_log_ring(bytes: usize), which does the VmFd::enable_cap() call and then stores the actual size of the ring buffer directly in the VmFd struct. Then on VcpuFd creation, we check if the dirty ring capability was ever enabled on the VmFd, and if so, just mmap the ring with the size stored in the VmFd at vcpu creation time (maybe disallow calling enable_dirty_log_ring() if vcpus were already created previously, although KVM also already checks this). That way the issue of "what is the correct size to mmap" goes away.

@roypat
Copy link
Member

roypat commented Sep 9, 2025

Also, do we need to do anything about KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP?

@roypat
Copy link
Member

roypat commented Sep 9, 2025

Send and Sync are probably not safe on KvmDirtyLogRing because accesses are not stateless, due to the state we need to keep track of in user space (next_dirty)

As for this, next_dirty doesnt imply that it cannot be Send/Sync, because the traits dont mean that multiple threads can access the struct at the same time (thats always unsound in Rust's memory model), just that you can transfer ownership between threads, and that read-only references can be shared between threads. A problem would only arrive if we can also Clone this structure, because then the iterator can cause data races inside the mmap'd area from different Rust threads. But as long as for each vcpu we only allow creating a single instance of the iterator (using safe code that is), there are no problems.

Now, whether Send and Sync will actually be useful is a different matter, because I'm assuming you want them to be able to harvest the dirty ring while the vcpu is still running, but with the current API in this PR, you can only get a &mut reference to the ring buffer structure, which you cannot send to another thread anyway. So in this case, you'd need some API that lets you take ownership of the KvmDirtyRingLog structure (maybe store the one that gets created at vcpu creation time in an option, and then have a function that just wraps .take() on that option, keeping in mind that we must never allow safe code to get two owned versions of it for the same vcpu).

@roypat
Copy link
Member

roypat commented Sep 9, 2025

Also sorry for the late response, we were all preparing for / traveling to KVM Forum last week!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants