Skip to content

Commit 217017e

Browse files
committed
Update benchmark notes
And add a new activitypub (mastodon) benchmark.
1 parent 7ea4bdb commit 217017e

File tree

3 files changed

+31
-11
lines changed

3 files changed

+31
-11
lines changed

benchmark/data/activitypub.json

+1
Large diffs are not rendered by default.

benchmark/encoder.rb

+19-8
Original file line numberDiff line numberDiff line change
@@ -55,22 +55,31 @@ def benchmark_encoding(benchmark_name, ruby_obj, check_expected: true, except: [
5555
puts
5656
end
5757

58-
# On the first two micro benchmarks, the limitting factor is that we have to create a Generator::State object for every
59-
# call to `JSON.dump`, so we cause 2 allocations per call where alternatives only do one allocation.
60-
# The performance difference is mostly more time spent in GC because of this extra pressure.
61-
# If we re-use the same `JSON::State` instance, we're faster than Oj on the array benchmark, and much closer
62-
# on the Hash one.
58+
# NB: Notes are based on ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
59+
60+
# On the first two micro benchmarks, the limitting factor is the fixed cost of initializing the
61+
# generator state. Since `JSON.generate` now lazily allocate the `State` object we're now ~10% faster
62+
# than `Oj.dump`.
6363
benchmark_encoding "small mixed", [1, "string", { a: 1, b: 2 }, [3, 4, 5]]
6464
benchmark_encoding "small nested array", [[1,2,3,4,5]]*10
65+
66+
# On small hash specifically, we're just on par with `Oj.dump`. Would be worth investigating why
67+
# Hash serialization doesn't perform as well as other types.
6568
benchmark_encoding "small hash", { "username" => "jhawthorn", "id" => 123, "event" => "wrote json serializer" }
6669

67-
# On these benchmarks we perform well. Either on par or very closely faster/slower
68-
benchmark_encoding "integers", (1_000_000..1_001_000).to_a, except: %i(json_state)
70+
# On string encoding we're ~20% faster when dealing with mostly ASCII, but ~10% slower when dealing
71+
# with mostly multi-byte characters. This is a tradeoff.
6972
benchmark_encoding "mixed utf8", ([("a" * 5000) + "€" + ("a" * 5000)] * 500), except: %i(json_state)
7073
benchmark_encoding "mostly utf8", ([("€" * 3333)] * 500), except: %i(json_state)
71-
benchmark_encoding "twitter.json", JSON.load_file("#{__dir__}/data/twitter.json"), except: %i(json_state)
74+
75+
# On these benchmarks we perform well, we're on par or better.
76+
benchmark_encoding "integers", (1_000_000..1_001_000).to_a, except: %i(json_state)
77+
benchmark_encoding "activitypub.json", JSON.load_file("#{__dir__}/data/activitypub.json"), except: %i(json_state)
7278
benchmark_encoding "citm_catalog.json", JSON.load_file("#{__dir__}/data/citm_catalog.json"), except: %i(json_state)
7379

80+
# On twitter.json we're still about 10% slower, this is worth investigating.
81+
benchmark_encoding "twitter.json", JSON.load_file("#{__dir__}/data/twitter.json"), except: %i(json_state)
82+
7483
# This benchmark spent the overwhelming majority of its time in `ruby_dtoa`. We rely on Ruby's implementation
7584
# which uses a relatively old version of dtoa.c from David M. Gay.
7685
# Oj in `compat` mode is ~10% slower than `json`, but in its default mode is noticeably faster here because
@@ -82,4 +91,6 @@ def benchmark_encoding(benchmark_name, ruby_obj, check_expected: true, except: [
8291
# Oj speed without losing precision.
8392
benchmark_encoding "canada.json", JSON.load_file("#{__dir__}/data/canada.json"), check_expected: false, except: %i(json_state)
8493

94+
# We're about 10% faster when `to_json` calls are involved, but this wasn't particularly optimized, there might be
95+
# opportunities here.
8596
benchmark_encoding "many #to_json calls", [{object: Object.new, int: 12, float: 54.3, class: Float, time: Time.now, date: Date.today}] * 20, except: %i(json_state)

benchmark/parser.rb

+11-3
Original file line numberDiff line numberDiff line change
@@ -26,24 +26,32 @@ def benchmark_parsing(name, json_output)
2626
puts
2727
end
2828

29+
# NB: Notes are based on ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
30+
2931
# Oj::Parser is very significanly faster (2.70x) on the nested array benchmark
3032
# thanks to its stack implementation that saves resizing arrays.
33+
# But we're on par with `Oj.dumo`
3134
benchmark_parsing "small nested array", JSON.dump([[1,2,3,4,5]]*10)
3235

33-
# Oj::Parser is significanly faster (~1.5x) on the next 4 benchmarks
34-
# in large part thanks to its string caching.
36+
# Oj::Parser is significanly faster (~1.5x) on the next 4 benchmarks in large part thanks to its string caching.
37+
3538
# Other than that we're either a bit slower or a bit faster than regular `Oj.load`.
3639
benchmark_parsing "small hash", JSON.dump({ "username" => "jhawthorn", "id" => 123, "event" => "wrote json serializer" })
3740

3841
benchmark_parsing "test from oj", <<JSON
3942
{"a":"Alpha","b":true,"c":12345,"d":[true,[false,[-123456789,null],3.9676,["Something else.",false],null]],"e":{"zero":null,"one":1,"two":2,"three":[3],"four":[0,1,2,3,4]},"f":null,"h":{"a":{"b":{"c":{"d":{"e":{"f":{"g":null}}}}}}},"i":[[[[[[[null]]]]]]]}
4043
JSON
4144

45+
# On these two more realistic benchmarks, still significanlty slower than alternatives.
46+
# Caching of keys is likely required to be able to match performance.
47+
# On the twitter and activitypub payloads the difference isn't that big (~10%)
48+
# but on citm_catalog it's up to a 50% difference.
49+
benchmark_parsing "activitypub.json", File.read("#{__dir__}/data/activitypub.json")
4250
benchmark_parsing "twitter.json", File.read("#{__dir__}/data/twitter.json")
4351
benchmark_parsing "citm_catalog.json", File.read("#{__dir__}/data/citm_catalog.json")
4452

4553
# rapidjson is 8x faster thanks to it's much more performant float parser.
4654
# Unfortunately, there isn't a lot of existing fast float parsers in pure C,
4755
# and including C++ is problematic.
48-
# Aside from that, we're faster than other alternatives here.
56+
# Aside from that, we're much faster than other alternatives here.
4957
benchmark_parsing "float parsing", File.read("#{__dir__}/data/canada.json")

0 commit comments

Comments
 (0)