-
Notifications
You must be signed in to change notification settings - Fork 42
V2 + Deflate serialization support with flate2's stream wrappers #43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,10 +1,11 @@ | ||
use super::V2_COOKIE; | ||
use super::{V2_COOKIE, V2_COMPRESSED_COOKIE}; | ||
use super::super::{Counter, Histogram, RestatState}; | ||
use super::super::num::ToPrimitive; | ||
use std::io::{self, Cursor, ErrorKind, Read}; | ||
use std::marker::PhantomData; | ||
use std; | ||
use super::byteorder::{BigEndian, ReadBytesExt}; | ||
use super::flate2::read::DeflateDecoder; | ||
|
||
/// Errors that can happen during deserialization. | ||
#[derive(Debug, PartialEq, Eq, Clone, Copy)] | ||
|
@@ -57,10 +58,29 @@ impl Deserializer { | |
-> Result<Histogram<T>, DeserializeError> { | ||
let cookie = reader.read_u32::<BigEndian>()?; | ||
|
||
if cookie != V2_COOKIE { | ||
return match cookie { | ||
V2_COOKIE => self.deser_v2(reader), | ||
V2_COMPRESSED_COOKIE => self.deser_v2_compressed(reader), | ||
_ => Err(DeserializeError::InvalidCookie) | ||
} | ||
} | ||
|
||
fn deser_v2_compressed<T: Counter, R: Read>(&mut self, reader: &mut R) -> Result<Histogram<T>, DeserializeError> { | ||
let payload_len = reader.read_u32::<BigEndian>()?.to_usize() | ||
.ok_or(DeserializeError::UsizeTypeTooSmall)?; | ||
|
||
// TODO reuse deflate buf, or switch to lower-level flate2::Decompress | ||
let mut deflate_reader = DeflateDecoder::new(reader.take(payload_len as u64)); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. allocation :'( There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it the allocation of the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, but mostly for intellectual vanity reasons (and a little bit because I want to keep the way to no-heap open) I want serialization and deserialization to be allocation-free. :) |
||
let inner_cookie = deflate_reader.read_u32::<BigEndian>()?; | ||
if inner_cookie != V2_COOKIE { | ||
return Err(DeserializeError::InvalidCookie); | ||
} | ||
|
||
self.deser_v2(&mut deflate_reader) | ||
} | ||
|
||
|
||
fn deser_v2<T: Counter, R: Read>(&mut self, reader: &mut R) -> Result<Histogram<T>, DeserializeError> { | ||
let payload_len = reader.read_u32::<BigEndian>()?.to_usize() | ||
.ok_or(DeserializeError::UsizeTypeTooSmall)?; | ||
let normalizing_offset = reader.read_u32::<BigEndian>()?; | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,11 +1,12 @@ | ||
//! # Serialization/deserialization | ||
//! | ||
//! The upstream Java project has established several different types of serialization. We have | ||
//! currently implemented one (the "V2" format, following the names used by the Java | ||
//! implementation), and will add others as time goes on. These formats are compact binary | ||
//! representations of the state of the histogram. They are intended to be used | ||
//! for archival or transmission to other systems for further analysis. A typical use case would be | ||
//! to periodically serialize a histogram, save it somewhere, and reset the histogram. | ||
//! currently implemented V2 and V2 + DEFLATE (following the names used by the Java implementation). | ||
//! | ||
//! These formats are compact binary representations of the state of the histogram. They are | ||
//! intended to be used for archival or transmission to other systems for further analysis. A | ||
//! typical use case would be to periodically serialize a histogram, save it somewhere, and reset | ||
//! the histogram. | ||
//! | ||
//! Histograms are designed to be added, subtracted, and otherwise manipulated, and an efficient | ||
//! storage format facilitates this. As an example, you might be capturing histograms once a minute | ||
|
@@ -18,22 +19,27 @@ | |
//! | ||
//! # Performance concerns | ||
//! | ||
//! Serialization is quite fast; serializing a histogram that represents 1 to `u64::max_value()` | ||
//! with 3 digits of precision with tens of thousands of recorded counts takes about 40 | ||
//! microseconds on an E5-1650v3 Xeon. Deserialization is about 3x slower, but that will improve as | ||
//! there are still some optimizations to perform. | ||
//! Serialization is quite fast; serializing a histogram in V2 format that represents 1 to | ||
//! `u64::max_value()` with 3 digits of precision with tens of thousands of recorded counts takes | ||
//! about 40 microseconds on an E5-1650v3 Xeon. Deserialization is about 3x slower, but that will | ||
//! improve as there are still some optimizations to perform. | ||
//! | ||
//! For the V2 format, the space used for a histogram will depend mainly on precision since higher | ||
//! precision will reduce the extent to which different values are grouped into the same bucket. | ||
//! Having a large value range (e.g. 1 to `u64::max_value()`) will not directly impact the size if | ||
//! there are many zero counts as zeros are compressed away. | ||
//! | ||
//! V2 + DEFLATE is significantly slower to serialize (around 10x) but only a little bit slower to | ||
//! deserialize (less than 2x). YMMV depending on the compressibility of your histogram data, the | ||
//! speed of the underlying storage medium, etc. Naturally, you can always compress at a later time: | ||
//! there's no reason why you couldn't serialize as V2 and then later re-serialize it as V2 + | ||
//! DEFLATE on another system (perhaps as a batch job) for better archival storage density. | ||
//! | ||
//! # API | ||
//! | ||
//! Each serialization format has its own serializer struct, but since each format is reliably | ||
//! distinguishable from each other, there is only one `Deserializer` struct that will work for | ||
//! any of the formats this library implements. For now there is only one serializer | ||
//! (`V2Serializer`) but more will be added. | ||
//! any of the formats this library implements. | ||
//! | ||
//! Serializers and deserializers are intended to be re-used for many histograms. You can use them | ||
//! for one histogram and throw them away; it will just be less efficient as the cost of their | ||
|
@@ -84,9 +90,11 @@ | |
//! | ||
//! impl Serialize for V2HistogramWrapper { | ||
//! fn serialize<S: Serializer>(&self, serializer: S) -> Result<(), ()> { | ||
//! // not optimal to not re-use the vec and serializer, but it'll work | ||
//! // Not optimal to not re-use the vec and serializer, but it'll work | ||
//! let mut vec = Vec::new(); | ||
//! // map errors as appropriate for your use case | ||
//! // Pick the serialization format you want to use. Here, we use plain V2, but V2 + | ||
//! // DEFLATE is also available. | ||
//! // Map errors as appropriate for your use case. | ||
//! V2Serializer::new().serialize(&self.histogram, &mut vec) | ||
//! .map_err(|_| ())?; | ||
//! serializer.serialize_bytes(&vec)?; | ||
|
@@ -163,6 +171,7 @@ | |
//! | ||
|
||
extern crate byteorder; | ||
extern crate flate2; | ||
|
||
#[path = "tests.rs"] | ||
#[cfg(test)] | ||
|
@@ -176,13 +185,19 @@ mod benchmarks; | |
mod v2_serializer; | ||
pub use self::v2_serializer::{V2Serializer, V2SerializeError}; | ||
|
||
#[path = "v2_deflate_serializer.rs"] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You know that Rust, when confronted with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You'd think that, and yet without this...
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm.. Reading issue rust-lang/rust#37872, it seems like this is caused by us using |
||
mod v2_deflate_serializer; | ||
pub use self::v2_deflate_serializer::{V2DeflateSerializer, V2DeflateSerializeError}; | ||
|
||
#[path = "deserializer.rs"] | ||
mod deserializer; | ||
pub use self::deserializer::{Deserializer, DeserializeError}; | ||
|
||
const V2_COOKIE_BASE: u32 = 0x1c849303; | ||
const V2_COMPRESSED_COOKIE_BASE: u32 = 0x1c849304; | ||
|
||
const V2_COOKIE: u32 = V2_COOKIE_BASE | 0x10; | ||
const V2_COMPRESSED_COOKIE: u32 = V2_COMPRESSED_COOKIE_BASE | 0x10; | ||
|
||
const V2_HEADER_SIZE: usize = 40; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This, and its cousin in the private test module, are regrettable, but I still have a handful of TODOs to explore in serialization so I don't want to commit to anything just yet. Plus, I suspect it will not be common that users want to polymorphically choose which serializer they use (if for no other reason than that this feature is so new...).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd make me happy if we could at least somehow share this trait and its
impl
s between the benchmarks and the tests though.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I would like that as well but I couldn't think of a way to do it without making it
pub
. Thoughts?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you may be able to (ab)use the
#[path]
syntax to include a file with this trait +impl
s as a module in both...