Skip to content

Conversation

@Freax13
Copy link
Contributor

@Freax13 Freax13 commented May 16, 2021

This pr adds nbt::from_slice to enable zero copy deserialization for strings. The implementation is inspired by serde_json which uses a trait to abstract over multiple sources.

All of the read_* functions in raw.rs got moved into a trait. This trait is implemented for std::io::Read and &[u8].

The read_bare_string function was modified to also take a scratch buffer and return a borrowed, copied or owned string. In a lot of cases this enables reading a string without allocating. This is useful beyond zero copy as it's also used for identifiers in structs. This optimization is not used for Value and Blob as they always allocate anyways.

Even though zero support is added Cow<'de, str> has to be used because modified utf-8 does always allow zero copy.

Example:

#[derive(Deserialize)]
#[serde(rename_all = "PascalCase")]
pub struct Section<'a> {
    #[serde(serialize_with = "nbt::i64_array", default)]
    pub block_states: Option<Vec<i64>>,
    #[serde(borrow)]
    pub palette: Option<Vec<Block<'a>>>,
    pub y: i8,
}

#[derive(Deserialize, Hash)]
#[serde(rename_all = "PascalCase")]
pub struct Block<'a> {
    #[serde(borrow)]
    pub name: Cow<'a, str>,
    #[serde(default, borrow)]
    pub properties: Option<BTreeMap<Cow<'a, str>, Cow<'a, str>>>,
}

I had some trouble getting reliable results from the benchmarks, but there usually were only small regressions of up to 5% and improvements of up to 10-20%. (The benchmarks were not modified, I suspect all the speed up was from prevent allocations for identifiers in structs). I got varying but mostly positive results in my personal project for parsing anvil chunks with zero copy.

@atheriel
Copy link
Collaborator

I'm not sure I'm 100% following the changes, but it looks like internally you use a Cursor to wrap the byte slice. Since Cursor implements io::Read, wouldn't it be possible to implement from_slice() by wrapping the byte slice in a Cursor at the top level and passing that to from_reader()? That could greatly simplify the changes needed. Or am I missing something?

@Freax13
Copy link
Contributor Author

Freax13 commented May 19, 2021

They are very similar but not quite the same: read_bare_string is different because it needs access to the slice to borrow from it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants