-
Notifications
You must be signed in to change notification settings - Fork 229
Decoding performance in comparison to vanilla protoc #153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This kind of issue is being worked on, #144 should slightly aid with this, I was also planning on working on rust acceleration for this. |
@Gobot1234 asking out of curiosity, why |
I wanted an excuse to learn Rust, I also think that it would be faster. |
I think extracting the parsing logic to a cython compiled module would be a good first step. Writing an extension in rust might give better performance but isn't as good a fit for this project. |
Thank you for looking into the matter! Some further experiments seem to indicate that performance decrease is most severe for messages containing repeated fields. Would one expect that as a result from issue #144? |
I just linked the issue as it is a change that brings performance improvements, it might bring some boosts to speed for repeated fields but I haven't tested that. |
I did some further benchmarks and the serialization/deserialization performance decreases significantly with repeated fields. Benchmark code: import betterproto
from dataclasses import dataclass
from typing import List
@dataclass
class TestMessage(betterproto.Message):
foo: int = betterproto.uint32_field(0)
bar: str = betterproto.string_field(1)
baz: float = betterproto.float_field(2)
some_ints: List[int] = betterproto.int32_field(3)
some_floats: List[float] = betterproto.float_field(4)
class BenchMessage:
"""Test creation and usage a proto message."""
def setup(self):
self.cls = TestMessage
self.instance = TestMessage()
self.instance_filled = TestMessage(0, "test", 0.0, [x for x in range(0, 1024*1024)], [x for x in range(0, 1024*1024)])
self.instance_filled_serialized = bytes(self.instance_filled)
def time_overhead(self):
"""Overhead in class definition."""
@dataclass
class Message(betterproto.Message):
foo: int = betterproto.uint32_field(0)
bar: str = betterproto.string_field(1)
baz: float = betterproto.float_field(2)
some_ints: List[int] = betterproto.int32_field(3)
some_floats: List[float] = betterproto.float_field(4)
def time_instantiation(self):
"""Time instantiation"""
self.cls()
def time_attribute_access(self):
"""Time to access an attribute"""
self.instance.foo
self.instance.bar
self.instance.baz
def time_init_with_values(self):
"""Time to set an attribute"""
self.cls(0, "test", 0.0, [x for x in range(0, 1024*1024)], [x for x in range(0, 1024*1024)])
def time_attribute_setting(self):
"""Time to set attributes"""
self.instance.foo = 0
self.instance.bar = "test"
self.instance.baz = 0.0
def time_serialize(self):
"""Time serializing a message to wire."""
bytes(self.instance_filled)
def time_deserialize(self):
"""Time deserializing a message from wire."""
message = self.cls()
message.parse(self.instance_filled_serialized) $ poetry run asv dev
· Discovering benchmarks
· Running 8 total benchmarks (1 commits * 1 environments * 8 benchmarks)
[ 0.00%] ·· Benchmarking existing-pyc__python-betterproto_.venv_scripts_python.exe
[ 6.25%] ··· benchmarks.BenchMessage.time_attribute_access 19.1±0μs
[ 12.50%] ··· benchmarks.BenchMessage.time_attribute_setting 18.8±0μs
[ 18.75%] ··· benchmarks.BenchMessage.time_deserialize 4.03±0s
[ 25.00%] ··· benchmarks.BenchMessage.time_init_with_values 113±0ms
[ 31.25%] ··· benchmarks.BenchMessage.time_instantiation 34.2±0μs
[ 37.50%] ··· benchmarks.BenchMessage.time_overhead 492±0μs
[ 43.75%] ··· benchmarks.BenchMessage.time_serialize 2.01±0s
[ 50.00%] ··· benchmarks.MemSuite.mem_instance 208 Line_profiler output: $ poetry run python -m line_profiler benchmarks.py.lprof
Timer unit: 1e-07 s
Total time: 20.2788 s
File: C:\python-betterproto\src\betterproto\__init__.py
Function: parse at line 827
Line # Hits Time Per Hit % Time Line Contents
==============================================================
827 @profile
828 def parse(self: T, data: bytes) -> T:
829 """
830 Parse the binary encoded Protobuf into this message instance. This
831 returns the instance itself and is therefore assignable and chainable.
832
833 Parameters
834 -----------
835 data: :class:`bytes`
836 The data to parse the protobuf from.
837
838 Returns
839 --------
840 :class:`Message`
841 The initialized message.
842 """
843 # Got some data over the wire
844 1 216.0 216.0 0.0 self._serialized_on_wire = True
845 1 31.0 31.0 0.0 proto_meta = self._betterproto
846 4 54199.0 13549.8 0.0 for parsed in parse_fields(data):
847 3 69.0 23.0 0.0 field_name = proto_meta.field_name_by_number.get(parsed.number)
848 3 23.0 7.7 0.0 if not field_name:
849 self._unknown_fields += parsed.raw
850 continue
851
852 3 28.0 9.3 0.0 meta = proto_meta.meta_by_field_name[field_name]
853
854 value: Any
855 3 73.0 24.3 0.0 if parsed.wire_type == WIRE_LEN_DELIM and meta.proto_type in PACKED_TYPES:
856 # This is a packed repeated field.
857 2 9.0 4.5 0.0 pos = 0
858 2 11.0 5.5 0.0 value = []
859 2097154 12457570.0 5.9 6.1 while pos < len(parsed.value):
860 2097152 12891589.0 6.1 6.4 if meta.proto_type in [TYPE_FLOAT, TYPE_FIXED32, TYPE_SFIXED32]:
861 1048576 7039734.0 6.7 3.5 decoded, pos = parsed.value[pos : pos + 4], pos + 4
862 1048576 4927477.0 4.7 2.4 wire_type = WIRE_FIXED_32
863 1048576 6356895.0 6.1 3.1 elif meta.proto_type in [TYPE_DOUBLE, TYPE_FIXED64, TYPE_SFIXED64]:
864 decoded, pos = parsed.value[pos : pos + 8], pos + 8
865 wire_type = WIRE_FIXED_64
866 else:
867 1048576 45026476.0 42.9 22.2 decoded, pos = decode_varint(parsed.value, pos)
868 1048576 5425998.0 5.2 2.7 wire_type = WIRE_VARINT
869 2097152 28162473.0 13.4 13.9 decoded = self._postprocess_single(
870 2097152 68894603.0 32.9 34.0 wire_type, meta, field_name, decoded
871 )
872 2097152 11550145.0 5.5 5.7 value.append(decoded)
873 else:
874 1 19.0 19.0 0.0 value = self._postprocess_single(
875 1 67.0 67.0 0.0 parsed.wire_type, meta, field_name, parsed.value
876 )
877
878 3 379.0 126.3 0.0 current = getattr(self, field_name)
879 3 21.0 7.0 0.0 if meta.proto_type == TYPE_MAP:
880 # Value represents a single key/value pair entry in the map.
881 current[value.key] = value.value
882 3 43.0 14.3 0.0 elif isinstance(current, list) and not isinstance(value, list):
883 current.append(value)
884 else:
885 3 278.0 92.7 0.0 setattr(self, field_name, value)
886
887 1 6.0 6.0 0.0 return self From that it looks as if the bottleneck would be the |
Yes I have some further ideas for improving speeds. I'll see how it goes |
Is there any update to this? Protobufs with repeated fields (we are passing around dataframes) seem to serialize 50x faster with vanilla protoc compared to better-protoc |
See #520 |
I see! I got really excited for this package because it allows for really intuitive conversions between dataframes and protobufs via dictionarys! (i.e. But given how slow serialization is, I will have to wait for that to improve and mature before we adopt it. |
Just to clarify, #545 was merged which means we can use I'm not feeling particularly great about installing another package that is hosted by a different person entirely as an optional dependency. Not to be too picky, but this isn't the wild |
Yes that is the aim at some point but till the tooling gets better there was no way to have it be in the repo as an optional dep or something. Have a look through the slack if you want more info about it |
Tackling the performance issues would be of great use for real world usage. Here are some running times that i can share: 14MB json file that was loaded into a betterproto object and made roundtrips into bytes and json on an M1 Macbook Pro:
While the rust-codec is much faster, especially for serializing to bytes, the parsing is still slower than betterproto was at the last preview release. And somehow the jsons string serializing speed got worse during the last year. |
Hmm, might be useful to bisect these not quite sure how it could have got so much slower. |
Hi,
I switched to betterproto from the vanilla Google protoc plugin for Python. So far I am really happy with the generated API and the benefits that come with it.
However, I observed that the decoding performance of binary protobuf messages decreased significantly with the switch to betterproto. In my tests the vanilla protoc plugin was approx. 250-300x faster.
Is this a known problem? I have seen that there has been work on the serialize/deserialize performance (e.g. #46) and that there have been benchmarks added to the betterproto codebase. However, as far as I have seen these are checking for introduced bottlenecks in betterproto commits and do not compare the performance of betterproto to vanilla protoc.
I would be more than happy if someone could comment on this. I would really like to continue to use betterproto but this performance bottleneck is currently a blocker.
Many thanks in advance!
Jan
The text was updated successfully, but these errors were encountered: