Skip to content
This repository was archived by the owner on Nov 6, 2024. It is now read-only.

[RFC] Add serde [de]serialization for KVM bindings #4

Closed
wants to merge 1 commit into from

Conversation

alxiord
Copy link
Member

@alxiord alxiord commented Jul 18, 2019

This RFC PR adds #[derive(Serialize, Deserialize)] to kvm-bindings structs where possible. A Python script prepends the macro to the struct definitions, leaving out several "blacklisted" structs and all unions. For these, Serialize and Deserialize are implemented by leveraging the byte representation of the FFI-safe objects (i.e. their bytes are serialized as a &[u8]).

The serialization code is autogenerated at build time and kernel/arch-agnostic. It is based on a template and a list of "special" data structures (__IncompleteArrayField, for instance) that have a blanket serialization format and depend on the consumer to add a sane implementation.

Serialization for bindings has a part to play in snapshotting Firecracker and other VMMs that use this crate. It might also prove useful in live migration of rust-vmm components.

Known missing items from this PR:

  • tests and any form of documentation

Fixes: #5

@@ -1,4 +1,61 @@
[[package]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember there was a discussion whether the Cargo.lock file should be included and the conclusion is:

  1. for library crate, the Cargo.lock should be excluded.
  2. for bin/app crate, the Cargo.lock should be included.
    So how about remove the Cargo.lock file?

@jiangliu
Copy link
Member

jiangliu commented Aug 5, 2019

Really great work, thanks @aghecenco !
We have discussed to relax dependency on serde traits by rust features, but seems it's a little hard to do that.
Any suggestions?

@alxiord
Copy link
Member Author

alxiord commented Aug 5, 2019

I’m going to work on that, haven’t had the chance to yet. I’m meaning to change the implementation itself too, with macros. I’ll add a build feature that optionally pulls in serde and adds the serialization functionality.

@jiangliu
Copy link
Member

jiangliu commented Aug 5, 2019

I’m going to work on that, haven’t had the chance to yet. I’m meaning to change the implementation itself too, with macros. I’ll add a build feature that optionally pulls in serde and adds the serialization functionality.

It's a common issue to control serde by features, so a common solution is welcomed!

@jiangliu
Copy link
Member

jiangliu commented Aug 5, 2019

How about adding a wrapper crate to feature control serde, typetag etc?

@andreeaflorescu
Copy link
Member

@jiangliu what do you mean by wrapper crate?

I am asking because I think it's similar to the proposal I had in the sync call which is: create another crate that uses kvm-bindings and re-exports the structures with added serde. Is that also what you had in mind?

@jiangliu
Copy link
Member

jiangliu commented Aug 5, 2019

@jiangliu what do you mean by wrapper crate?

I am asking because I think it's similar to the proposal I had in the sync call which is: create another crate that uses kvm-bindings and re-exports the structures with added serde. Is that also what you had in mind?

Seems we have different IDs.
The serde/serde_derive crates provide several proc_macro_derive macros, such as
#[derive(Serialize, Deserialize)].

What I'm suggesting is to implement serde_fake/serde_fake_derive, which provides blank implementation for #[derive(Serialize, Deserialize)] macros.

Then we could easily switch to the fake serde crates when serde feature is disabled.

@alxiord
Copy link
Member Author

alxiord commented Aug 5, 2019

The serde/serde_derive crates provide several proc_macro_derive macros, such as
#[derive(Serialize, Deserialize)].

What I'm suggesting is to implement serde_fake/serde_fake_derive, which provides blank implementation for #[derive(Serialize, Deserialize)] macros.

Then we could easily switch to the fake serde crates when serde feature is disabled.

@alexandruag had a suggestion too - if I understood it correctly it was something in the lines of adding a new macro to kvm-bindings structs. Let's say we called it #[derive(MaybeSerialize, MaybeDeserialize]. With conditional compilation, if we wanted serde, this new macro would implement the Deserialize and Serialize functions from serde. Without it, the MaybeSerialize / MaybeDeserialize macro would implement empty functions

@alexandruag please correct me if I got it wrong.
@jiangliu what are your thoughts on this?

@jiangliu
Copy link
Member

jiangliu commented Aug 5, 2019

The serde/serde_derive crates provide several proc_macro_derive macros, such as
#[derive(Serialize, Deserialize)].
What I'm suggesting is to implement serde_fake/serde_fake_derive, which provides blank implementation for #[derive(Serialize, Deserialize)] macros.
Then we could easily switch to the fake serde crates when serde feature is disabled.

@alexandruag had a suggestion too - if I understood it correctly it was something in the lines of adding a new macro to kvm-bindings structs. Let's say we called it #[derive(MaybeSerialize, MaybeDeserialize]. With conditional compilation, if we wanted serde, this new macro would implement the Deserialize and Serialize functions from serde. Without it, the MaybeSerialize / MaybeDeserialize macro would implement empty functions

@alexandruag please correct me if I got it wrong.
@jiangliu what are your thoughts on this?

I feel the only difference is that: @alexandruag suggests to introduce some new derive macros, my suggestion is to redefine derive macros of serde/serde_derive crates.

@alxiord
Copy link
Member Author

alxiord commented Aug 5, 2019

I feel the only difference is that: @alexandruag suggests to introduce some new derive macros, my suggestion is to redefine derive macros of serde/serde_derive crates.

@jiangliu if there are no name clashes because of using the same names as serde for macros, I like your solution better because it keeps naming consistency. I'll try it out and update here.

@alxiord
Copy link
Member Author

alxiord commented Aug 7, 2019

Another observation to consider - all data structures defined here are FFI safe so they can all be serialized as a raw byte dump. We could have a generic serialize/deserialize function to apply to all of them. Automating this with a script, there would be no more need to manually implement the macros for newly added structs in newer kernel versions.

(This isn't actual code, it's just for demonstrative purposes)

fn serialize_ffi(object: T) -> &[u8] {
    let serialized_object: Vec<u8> = vec![];
    serialized_object.copy_from_slice(slice::from_raw_parts(&object as *const T as *const u8, mem::size_of::<T>()));
    serialized_object
}

fn deserialize_ffi(serialized_object: &[u8]) -> T {
    ptr::read::<T>(serialized_object as *const u8 as *const T, mem::size_of::<T>())
}

Then if we want serde -

#[cfg(use_serde)]
use serde::{Serialize, Deserialize}];

impl Serialize for PLACEHOLDER {
    fn serialize<S>(&self, serializer: S) whereS: Serializer {
        serializer.serialize_bytes(serialize_ffi::<PLACEHOLDER>(self), mem::size_of::<PLACEHOLDER>())
    }
}

impl Deserialize for PLACEHOLDER {
    fn deserialize<D>(deserializer: D) where D: Deserializer<'de> {
        struct PLACEHOLDER_visitor;

        impl<'de> Visitor<'de> for PLACEHOLDER_visitor {
            type Value = PLACEHOLDER_visitor;

            fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
                formatter.write_str("PLACEHOLDER_visitor")
            }

            fn visit_seq<V>(self, mut seq: V) where V: SeqAccess<'de> {
                let bytes: Vec<u8> = seq.next_element().unwrap();
                Ok(deserialize_ffi::<PLACEHOLDER>(bytes))
            }
        }

        const FIELDS: &'static [&'static str] = &["bytes"];
        deserializer.deserialize_struct("PLACEHOLDER", FIELDS, PLACEHOLDER_visitor)
    }
}

@alexandruag
Copy link
Contributor

Hi, really sorry for being this late to the discussion. I think there are a couple of aspects we have to consider. Some of them have what appears to be a straightforward/widely accepted approach already, but I'm going to try and mention them all just to make sure we're on the same page.

  • We've chosen the serde interfaces (Serialize and Deserialize) to express the serialization functionality. This has the advantage of being able to leverage other crates/functionality which is built on top of them. At the same time, and it seems relatively straightforward switching to something like custom defined traits in case we ever need more expressivity.

  • Regarding gating the presence of the serialization functionality behind something, I like the feature-based approach most. From a high level perspective I see this as having another kvm-bindings feature ("serialize" or something like that), and then scattering conditional derives around, for example #[cfg_attr(feature = "serialize", derive(Serialize, Deserialize))].

  • We also have to consider the semantics of the serialization. For structures (even unions) that don't have flexible array members things are pretty straightforward: the data we want to serialize are the bytes which make up the structs, since we're dealing with "plain old data" structures. However, serialization for entities that also have a FAM can mean two things: only serialize the structure up to the FAM member, or serialize the FAM data as well. On one hand the latter is more natural, but it's also more awkward to implement, and the deserialization part is very difficult to safely define in Rust. That's why I think we should go with the first option, be very clear about it, and recommend ppl to use something like https://github.com/rust-vmm/vmm-sys-util/blob/master/src/fam.rs (which should also have a "serialize" feature) for higher level handling.

  • Then, there's the question of how to implement the serialization logic. I think doing it by hand for those structs where it's not possible to derive automatically via serde_derive is quickly becoming cumbersome and error-prone, especially given that we should add the serialization logic for all architectures supported in kvm-bindings. Something like the solution that @aghecenco just mentioned looks quite promising. By looking at each struct as a sequence of bytes we lose some expressivity in the serialized form, but we gain a lot of flexibility in the generic implementation that can easily be attached to any object. I think it's easiest to have a trait along the lines of (just some pseudocode very similar to Alexandra's example):

trait Foo: Sized + Copy {
   fn from_byte_slice(bytes: &[u8]) -> Self {
      assert_eq!(mem::size_of::<Self>, bytes.len());
      unsafe { ptr::read_unaligned(bytes as *const u8 as *const Self) }
   }

   fn as_byte_slice(&self) -> &[u8] {
      unsafe { slice::from_raw_parts(self as *const Self as *const u8, mem::size_of::<Self>() }
   }	
}

The advantage of using a trait is that we can add the implementations

impl<T: Foo> Serialize for T {
	...
}

impl<T: Foo> Deserialize for T {
	...
}

And then, since all the Foo methods have a default implementation, we can simply tag all types we'd like to have the serialization implemented for with impl Foo for TypeName {} (this can be gated by #[cfg_attr(feature = "serialize"], etc).

  • There are some concerns regarding the overall versioning aspect for serialized data, but they look addressable while still being generic. I guess this should turn into a separate conversation at some point.

Serde is pulled in as a conditional dependency, activated
by the with_serde feature. If active, this feature will also
generate sources that implement serde::Serialize and
serde::Deserialize for all generated bindings. The implementation
(de)serializes each struct/union's raw bytes, as all of them
are FFI-safe.

Signed-off-by: Alexandra Iordache <[email protected]>
@alxiord
Copy link
Member Author

alxiord commented Aug 28, 2019

@jiangliu @alexandruag please take another look. I changed...well, everything.

  • instead of deriving Serialize and Deserialize, each struct and union explicitly impls them, serializing and deserializing its inner byte representation as a &[u8].
  • instead of a Python script that adds the #[derive]s, all the code is autogenerated at build time from build.rs.
  • all kernel versions and architectures are autodetected and handled.
  • serde and serde_bytes (TLDR efficient byte array (de)serialization support) are conditionally compiled in, activated by the with_serde feature.

@jiangliu
Copy link
Member

jiangliu commented Sep 9, 2019

Hi all,
Sorry for late reply. I have tried another way to tweak the #[derive(Serialize, Deserialize)] macros, and the result seems good enough for discussion, it seems lighter than the build.rs solution:)
Please take a look at my personal repo at: https://github.com/jiangliu/vmm-serde
And the change to the latest implementation from aghecenco(https://github.com/aghecenco/kvm-bindings/tree/serde) is small as:

diff --git a/Cargo.toml b/Cargo.toml
index 02f8c5d..64b6b96 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -12,9 +12,9 @@ build = "build.rs"
 [features]
 kvm-v4_14_0 = []
 kvm-v4_20_0 = []
-with_serde = ["libc", "serde", "serde_bytes"]
+with_serde = ["libc", "vmm-serde", "serde_bytes"]

 [dependencies]
 libc =  { version = ">=0.2.39", optional = true }
-serde = { version = ">=1.0.27", optional = true }
+vmm-serde = { path = "./vmm-serde", features = ["serde_derive"], optional = true }
 serde_bytes = { version = ">=0.11.2", optional = true }
diff --git a/serialization/header.rs.txt b/serialization/header.rs.txt
index b664e90..77624f6 100644
--- a/serialization/header.rs.txt
+++ b/serialization/header.rs.txt
@@ -4,8 +4,8 @@
 use std::mem;
 use std::ptr;

-use serde::de::{Deserialize, Deserializer};
-use serde::{Serialize, Serializer};
+use vmm_serde::de::{Deserialize, Deserializer};
+use vmm_serde::{Serialize, Serializer};
 use serde_bytes::ByteBuf;

 fn serialize_ffi<T>(something: &T) -> ByteBuf {
diff --git a/src/lib.rs b/src/lib.rs
index c77bae3..ed21c0c 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -8,7 +8,7 @@
 #[cfg(feature = "with_serde")]
 extern crate libc;
 #[cfg(feature = "with_serde")]
-extern crate serde;
+extern crate vmm_serde;
 #[cfg(feature = "with_serde")]
 extern crate serde_bytes;

@jiangliu
Copy link
Member

jiangliu commented Sep 9, 2019

It seems we could use proc_macro_derive() to achieve the same goal with build_serde.rs, it may be simpler and reused by crates. Should we have a try?

@jiangliu
Copy link
Member

@aghecenco @alexandruag
Sorry for late reply:)

  • I like the idea of adding "derive(Serialize, Deserialize)" by script from the first version, it clearly states what traits each data struct implements.
  • I like the serde_bytes based implementation from the second version too.
  • And I also want to build a serde infrastructure for the rust-vmm project. It may be used crates other than kvm-bindings, and helps to implement snapshot, live migration and live upgrade.
    So I have combined your first and second versions and implement derive(SerializeFfi, DeserializeFfi, DeserializeFfiFam) by proc_macro_derive.

I have push the new vmm-serde crate to my personal repo at: https://github.com/jiangliu/vmm-serde
and crate addition proposal at: rust-vmm/community#76
and a minor enhancement to vmm-sys-util::fam: rust-vmm/vmm-sys-util#42

With all this ready, it becomes easy to serialize/deserialize data structs generated by bindgen.
An example as below:

    #[cfg(feature = "serde_derive_ffi")]
    #[test]
    fn ffi_test_ffi_fam_struct() {
        #[repr(C)]
        #[derive(Default, Debug, SerializeFfi, DeserializeFfi)]
        pub struct __IncompleteArrayField<T>(::std::marker::PhantomData<T>, [T; 0]);
        impl<T> __IncompleteArrayField<T> {
            #[inline]
            pub fn new() -> Self {
                __IncompleteArrayField(::std::marker::PhantomData, [])
            }
        }

        #[repr(C)]
        #[derive(Debug, Default, SerializeFfi, DeserializeFfiFam)]
        pub struct kvm_msrs {
            pub nmsrs: u32,
            pub pad: u32,
            pub entries: __IncompleteArrayField<u64>,
        }

        impl SizeofFamStruct for kvm_msrs {
            fn size_of(&self) -> usize {
                self.nmsrs as usize * std::mem::size_of::<u64>() + std::mem::size_of::<Self>()
            }
        }

        let data = vec![
            kvm_msrs {
                nmsrs: 1,
                pad: 0,
                entries: __IncompleteArrayField::new(),
            },
            kvm_msrs {
                nmsrs: 0x1,
                pad: 0x2,
                entries: __IncompleteArrayField::new(),
            },
        ];
        let ser = serde_json::to_string(&data[0]).unwrap();
        let mut deserializer = serde_json::Deserializer::from_str(&ser);
        let content: Vec<kvm_msrs> = kvm_msrs::deserialize(&mut deserializer).unwrap();
        // let decoded: FamStructWrapper<kvm_msrs> = content.into();

        assert_eq!(content[0].nmsrs, 1);
        assert_eq!(content[0].pad, 0);
    }

And the Cargo.toml look like:

[features]
kvm-v4_14_0 = []
kvm-v4_20_0 = []
with_serde = ["serde", "vmm-serde/serde_derive_ffi"]

[dependencies]
serde = { version = ">=1.0.27", optional = true }
vmm-serde = { git = "https://github.com/jiangliu/vmm-serde", branch = "v1" }

@alxiord
Copy link
Member Author

alxiord commented Nov 18, 2019

Closing in favor of #11.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scope serde serialization support for bindings structs
4 participants