Skip to content

using this for small JSON's #198

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
whyCPPgofast opened this issue May 20, 2021 · 13 comments
Closed

using this for small JSON's #198

whyCPPgofast opened this issue May 20, 2021 · 13 comments
Labels
perf Performance

Comments

@whyCPPgofast
Copy link

Hi, I was benchmarking this against a very simple small JSON

{
    "id": "60a6965e5e47ef8456878326",
    "index": 0,
    "guid": "cfce331d-07f3-40d3-b3d9-0672f651c26d",
    "isActive": true,
    "picture": "http://placehold.it/32x32",
    "age": 22
  }

Now my use case is: parse a small JSON as fast as possible just ONCE.

the results for me were (1 parse):
serde_json = 3 microseconds
simd_json = 10 microseconds

I was wondering if its normal for serde_json to be faster in smaller JSON's or am I getting incorrect results?

@whyCPPgofast
Copy link
Author

whyCPPgofast commented May 20, 2021

Here is a very bad bench but the differences are big enough...

#![allow(warnings)]

use std::time::Instant;
use serde::Deserialize;
use serde_json;
use simd_json;

#[derive(Deserialize)]
struct Person {
    id: String,
    index: i32,
    guid: String,
    isActive: bool,
    picture: String,
    age: u32
}

fn main() {

    let json_bytes = br#"{
        "id": "60a6965e5e47ef8456878326",
        "index": 0,
        "guid": "cfce331d-07f3-40d3-b3d9-0672f651c26d",
        "isActive": true,
        "picture": "http://placehold.it/32x32",
        "age": 22
    }"#.to_vec();



    let mut json_bytes_1 = json_bytes.clone();
    let now_1 = Instant::now();
    for _ in 0..100 {
        let p: Person = serde_json::from_slice(&json_bytes_1).unwrap();
    }
    println!("serde {:?}", now_1.elapsed());



    let mut json_bytes_2 = json_bytes.clone();
    let now_2 = Instant::now();
    for _ in 0..100 {
        let p2: simd_json::OwnedValue = simd_json::to_owned_value(&mut json_bytes_2).unwrap();
    }
    println!("simd_json {:?}", now_2.elapsed());

}
[dependencies]
serde = { version = "*", features = ["derive"] }
serde_json = "*"
simd-json = { version = "*", features = ["allow-non-simd"]}

@Licenser
Copy link
Member

That's a bit complicated to answer - one of those "it depends" situations 😭

simd gets 'better' for medium and larger files, for extremely small ones it's quite bad (i.e. smaller then the registers) I don't think that's the case for you, but there is some overhead.

So first of all, for small data serde-json can absolutely be faster then simd-json!

That said there are a few things:

The biggest issue in the benchmark is that it's comparing struct deserialization DOM serialization. The DOM serialization is quite a bit slower. To make a fair comparison and one that makes sense for users, you have to either compare dom deserialization for both or struct deserialization for both.

For benchmarks like that it usually is good to use a benchmark library as the compiler sometimes optimizes things away when it notices it isn't used. For example, the black_box function in criterion is one of those ways. (not sure if that applies here but for a good measurement it's a nice tool)

The third thing that will make a difference is using simd-json-derive for the deserialization via simd-json instead of the serde compatibility. Serdes deserialisation logic is slower since it has to be more generic and does a darn good job at that, with the simd-json-derive it is possible to optimize for exactly one format which gets quite a big faster. It is as simple as:

#[derive(Deserialize)] -> #[derive(Deserialize, simd_json_derive::Deserialize)]
let p2: simd_json::OwnedValue = simd_json::to_owned_value(&mut json_bytes_2).unwrap(); -> let p2= Person::from_slice(&mut json_bytes_2).unwrap();

Next, and this depends a bit on your use-case, is you can optimize this by pre-allocating and re-using buffers. If your program starts, reads a small JSON, and closes again it won't help but if it is long-running this might do you good:

    let mut json_bytes_2 = json_bytes.clone();
    let now_2 = Instant::now();
    let mut string_buffer = Vec::with_capacity(2048);
    let mut input_buffer = simd_json::AlignedBuf::with_capacity(1024);
    for _ in 0..100 {
        let p2= Person::from_slice_with_buffers(&mut json_bytes_2, &mut input_buffer, &mut string_buffer).unwrap();
    }

Last but not least, and again this depends on your use case, you could avoid allocating strings as simd-json is quite good at borrowing when deserialization structs (this works with serde too I think so I'll add the serde related code in this example after all got to compare apples and apples :) !):

struct Person<'ser> {
    #[serde(borrow)]
    id: &'ser str,
    index: i32,
    #[serde(borrow)]
    guid: &'ser str,
    isActive: bool,
    #[serde(borrow)]
    picture: &'ser str,
    age: u32
}

@Licenser
Copy link
Member

Also I noticed you're using allow-non-simd which will always be slower then serde as it disables all the simd optimisations

@Licenser
Copy link
Member

So I updated your benchmark a bit:

#![allow(warnings)]

use std::time::Instant;
use serde::Deserialize;
use simd_json_derive::Deserialize as SimdDeserialize;
use serde_json;
use simd_json;

#[derive(Deserialize, SimdDeserialize)]
struct Person {
    id: String,
    index: i32,
    guid: String,
    isActive: bool,
    picture: String,
    age: u32
}

#[derive(Deserialize, SimdDeserialize)]
struct PersonBorrowed<'ser> {
    #[serde(borrow)]
    id: &'ser str,
    index: i32,
    #[serde(borrow)]
    guid: &'ser str,
    isActive: bool,
    #[serde(borrow)]
    picture: &'ser str,
    age: u32
}

const N: usize = 100000;

fn main() {

    let json_bytes = br#"{
        "id": "60a6965e5e47ef8456878326",
        "index": 0,
        "guid": "cfce331d-07f3-40d3-b3d9-0672f651c26d",
        "isActive": true,
        "picture": "http://placehold.it/32x32",
        "age": 22
    }"#.to_vec();

    let mut json_bytes_2 = json_bytes.clone();
    let now_2 = Instant::now();
    for _ in 0..N {
        let p2: simd_json::OwnedValue = simd_json::to_owned_value(&mut json_bytes_2).unwrap();
    }
    println!("simd_json {:?}", now_2.elapsed());

    let mut json_bytes_2 = json_bytes.clone();
    let now_2 = Instant::now();
    for _ in 0..N {
        let p2: Person = simd_json::serde::from_slice(&mut json_bytes_2).unwrap();
        criterion::black_box(p2);
    }
    println!("simd_json (struct) {:?}", now_2.elapsed());

    let mut json_bytes_2 = json_bytes.clone();
    let now_2 = Instant::now();
    for _ in 0..N {
        let p2 = Person::from_slice(&mut json_bytes_2).unwrap();
        criterion::black_box(p2);
    }
    println!("simd_json (simd-struct) {:?}", now_2.elapsed());

    let mut json_bytes_2 = json_bytes.clone();
    let now_2 = Instant::now();
    for _ in 0..N {
        let p2 = PersonBorrowed::from_slice(&mut json_bytes_2).unwrap();
        criterion::black_box(p2);
    }
    println!("simd_json (simd-struct borrowed) {:?}", now_2.elapsed());


    let mut json_bytes_2 = json_bytes.clone();
    let now_2 = Instant::now();
    let mut string_buffer = Vec::with_capacity(2048);
    let mut input_buffer = simd_json::AlignedBuf::with_capacity(1024);
    for _ in 0..N {
        let p2 = PersonBorrowed::from_slice_with_buffers(&mut json_bytes_2, &mut input_buffer, &mut string_buffer).unwrap();
        criterion::black_box(p2);
    }
    println!("simd_json (simd-struct borrowed buffered) {:?}", now_2.elapsed());


    let mut json_bytes_1 = json_bytes.clone();
    let now_1 = Instant::now();
    for _ in 0..N {
        let p: Person = serde_json::from_slice(&json_bytes_1).unwrap();
        criterion::black_box(p);
    }
    println!("serde {:?}", now_1.elapsed());

    let mut json_bytes_1 = json_bytes.clone();
    let now_1 = Instant::now();
    for _ in 0..N {
        let p: PersonBorrowed = serde_json::from_slice(&json_bytes_1).unwrap();
        criterion::black_box(p);
    }
    println!("serde (borrowed) {:?}", now_1.elapsed());

}
[package]
name = "simd-bench-why"
version = "0.1.0"
authors = ["Heinz N. Gies <[email protected]>"]
edition = "2018"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
serde = { version = "*", features = ["derive"] }
serde_json = "*"
simd-json = { version = "*" }
simd-json-derive = "*"
criterion = "*"

I would recommend running that to look at your local system but here are the results I get on a laptop so variance is quite high bit serde is constantly faster:

simd_json 283.399304ms
simd_json (struct) 169.342152ms
simd_json (simd-struct) 168.756464ms
simd_json (simd-struct borrowed) 134.981265ms
simd_json (simd-struct borrowed buffered) 107.723584ms
serde 80.380321ms
serde (borrowed) 42.684127ms

@Licenser Licenser added the perf Performance label May 20, 2021
@whyCPPgofast
Copy link
Author

wow, thanks for the detailed response!

I ran your updated benchmark:

simd_json 102.0061ms
simd_json (struct) 62.7426ms
simd_json (simd-struct) 61.5463ms
simd_json (simd-struct borrowed) 48.9708ms
simd_json (simd-struct borrowed buffered) 40.3499ms
serde 42.1693ms
serde (borrowed) 27.4827ms

I actually had no idea you could use str slices for string fields with serde that could definitely speed up my program. Thanks!

@Licenser
Copy link
Member

👍 so the bottom line looks like this is a case where serde is faster :) just for giggles, I'd recommend giving it a spin in the app, switching between simd / serde on a feature flag is fairly simple given that they both have derive mechanics. I won't expect this to change but still curious :D

also if you're looking at processing newline delimited JSON, #194 might be something for you to keep an eye out for, if we get to implementing that the negative effects of small JSON for newline delimited readers will be negated.

@romanstingler
Copy link

romanstingler commented Dec 12, 2022

on modern cpus simd seems faster using structs and borrowed

  Model name:            Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz
    CPU family:          6
    Model:               165
    Thread(s) per core:  2
    Core(s) per socket:  6
    Socket(s):           1
    Stepping:            2
    CPU(s) scaling MHz:  78%
    CPU max MHz:         5000.0000
    CPU min MHz:         800.0000
    BogoMIPS:            5202.65
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe 
                         syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf
                          pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt ts
                         c_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single ssbd ibrs ibpb stibp
                          ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust sgx bmi1 avx2 smep bmi2 erms invpcid mpx
                          rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_windo

with N= 10M instead of 100k

simd_json 4.543799873s
simd_json (struct) 3.315843013s
simd_json (simd-struct) 3.113119353s
simd_json (simd-struct borrowed) 2.739245207s
simd_json (simd-struct borrowed buffered) 2.505824834s
serde 3.59520201s
serde (borrowed) 3.151620592s

@Licenser
Copy link
Member

That's a cool insight thank you!

@Licenser
Copy link
Member

There have been a number of updates to simd-json's performance with 0.13 simd-json is now significantly faster when taking full advantage of it:

simd_json 72.086096ms
simd_json (struct) 60.104834ms
simd_json (simd-struct) 58.718949ms
simd_json (simd-struct borrowed) 53.066635ms
simd_json (simd-struct borrowed buffered) 23.85362ms
serde 41.25703ms
serde (borrowed) 32.312223ms

I'll close this for now

@spdfnet
Copy link

spdfnet commented Aug 20, 2024

In case other people stumble into this, Zen4 performs like this:

simd_json 866.298192ms
simd_json (struct) 868.855126ms
simd_json (simd-struct) 863.212549ms
simd_json (simd-struct borrowed) 807.45798ms
simd_json (simd-struct borrowed buffered) 764.193746ms
serde 279.927102ms
serde (borrowed) 248.89467ms

@Licenser
Copy link
Member

are you running that with release builds and native CPU compilation? Those numbers are all surprisingly high.

@spdfnet
Copy link

spdfnet commented Aug 20, 2024

Right, with --release:

simd_json 48.04138ms
simd_json (struct) 39.693793ms
simd_json (simd-struct) 37.509766ms
simd_json (simd-struct borrowed) 34.854422ms
simd_json (simd-struct borrowed buffered) 18.973076ms
serde 21.062731ms
serde (borrowed) 17.707169ms

@Licenser
Copy link
Member

Ah yes, that looks much more in line with what people have seen before :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf Performance
Projects
None yet
Development

No branches or pull requests

4 participants