-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance of POST body processing speed is 10x slower in Cowboy 2.10.0 compared to 1.1.2 #1611
Comments
You may want to tweak the read_body options or the HTTP/1.1 option |
Thanks for the suggestions. We tweaked various options, changing We found out that changing The detailed micro benchmark code and results are noted here, see Test 3. https://github.com/AoiMoe/cowboy_post_bench The summary of tweaking While the performance regression on cowboy2 was somewhat mitigate by increasing the buffer size, the micro benchmark was performed on a loopback device rather than going through the real Internet route so it's not the real world scenario, we still think 10-20% performance regression is too much to risk the upgrade. We also think default behaviour should be sane. Is there any way we can do to completely fix the performance regression introduced in cowboy2? |
The changes that result in a performance drop are related to the support for HTTP/2 which performs better than HTTP/1.1 in real use cases. In the future Cowboy will also support HTTP/3 which performs even better ( There's likely room for improvement for HTTP/1.1 still, I'll take a look when time allows. But right now my priority is HTTP/3. There's not much point measuring performance using |
I've started looking into this in details. One interesting bit is that when moving to the new approach I had to move from a sync recv to async recv. At the time |
OK I verified the configuration of |
With a large
Numbers are above the 10x difference reported in the ticket. These requests are all 1MB in size and there are 10000 so 10GB is transferred in total, in about 7s so around 1.4GB per second. On localhost of course. Remains to be seen whether body reading is the only thing requiring this change or if it's good to have for requests too. |
I opened erlang/otp#9355 to question the default in OTP. I don't think we can set an appropriate default in Cowboy because Cowboy can't know in what environment it will run in (constrained or not). But we can definitely provide guidance in the documentation for what should be configured for high performance, as well as have a better default like the one I recommend OTP changes to. If OTP doesn't change its default Cowboy can set its own default to that value, and recommend a higher value in documentation. |
A large For HTTP/1, a large buffer (131072) makes requests without bodies a little slower but not significantly, and requests with large bodies a lot faster. For HTTP/2, a large buffer (131072) is just as bad as the default. A better value is around 32768. But that's only true with the default HTTP/2 protocol options. If you tweak those, then 131072 becomes better, and 32768 worse. This is because by default the HTTP/2 frames are smaller and so increasing their size leads to the larger buffer being more appropriate. For Websocket, a large buffer (131072) is a clear improvement, at least when a lot of data goes through (the larger the frames, the better the improvement, but also the smaller the frame, the worse it gets). In other words there's no real way to set an appropriate buffer value that works in all cases. On the other hand I am looking at doing two things:
|
I believe the branch at #1666 solves the issue. The solution ended up a little different than described above; we only consider incoming packet sizes for resizing the buffer. |
We're porting an Erlang software that depends on now deprecated Cowboy 1.1.2 to the recent Cowboy 2.10.0.
During the porting process, we found out that on processing POST body, Cowboy 2.10.0 performs 10x slower in terms of bandwidth relative to the Cowboy 1.1.2 without enabling JIT. It's still 8.4x slower even if we enabled JIT.
This regression of performance prevent us to update the Cowboy in our software.
Here is the minimal benchmark code to reproduce the issue, and the summary of benchmark result.
https://github.com/AoiMoe/cowboy_post_bench
The text was updated successfully, but these errors were encountered: