-
Notifications
You must be signed in to change notification settings - Fork 40
consider dropping wsaccel dependency #127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Amazing results! wsproto is already py3 only, so I would suggest using "Solution 3" with this snippet: native_byteorder = sys.byteorder
def mask3(mask, data):
datalen = len(data)
data = int.from_bytes(data, native_byteorder)
mask = int.from_bytes(mask * (datalen // 4) + mask[: datalen % 4], native_byteorder)
return (data ^ mask).to_bytes(datalen, native_byteorder) PRs are welcome! |
It's a little confusing from the article, but solution 4 (using |
I measured the https://nbviewer.jupyter.org/gist/belm0/2b610cc405dd3dac977f34650cb187cc |
The So I'm also leaning towards "solution 3". It's still like 50x faster than what we have right now, and doesn't have the risk of creating some pathological behavior in constrained environments. |
Fair point about memory and startup time. Though "what we have now" is wsaccel, which is 2x faster than solution 3. So perhaps stay with wsaccel then. |
Note that this discussion started over here: python-trio/trio-websocket#119 See in particular python-trio/trio-websocket#119 (comment) and below, which discuss the up- and down-sides of implementing our own accelerator. (Basically: upside is that it's pretty easy to get 10x faster than wsaccel or the pure python stuff above; downside is then you have a C accelerator module to maintain.) |
...though, ugh, I guess wsaccel doesn't even have wheels, so right now by depending on it, trio-websocket forces everyone to have a compiler installed. (This is particularly problematic on windows.) |
The import tracemalloc
mask = b'\0\1\2\3'
n = 10 * 2**20
data = b'\1' * n
tracemalloc.start()
mask3(mask, data)
print(tracemalloc.get_traced_memory()[1] // 2**20, 'MiB')
The optimal result is 0 MiB, i.e. doing the masking in-place, which should be allowed given BTW, the blog post uses |
BTW2, the entire masking business is unfortunate. It is meant to protect against bad proxies, but probably 99% of WebSocket traffic in the open internet is over TLS, which makes it unnecessary. If I had control over all WebSocket clients in the world, I would just make them use |
You could save the startup time by pregenerating that lookup table, and inserting a literal... |
Doesn't the article say that aiohttp uses solution 3? |
It was solution 3 at the time the article was written, but now it is 4 thanks to the author of the article. |
Something to consider with the translate method is that it has to convert to and from a bytearray if the input and output is The XOR part will still take the majority of the time, but if the payload data were a bytearray to begin with, it can avoid some extra copying. It does become more significant with smaller payload sizes. |
Copying small payloads is very cheap though :-). |
True. I was thinking the copying to XORing ratio would be more significant with small payloads. But yeah, small payloads are always going to be fast. |
wsaccel has been dropped, see 89f442b |
For what it's worth, I did have a measurable regression in my app upon upgrading to wsproto to >= 0.15.0. It was a 1% increase in CPU cycles, which is significant given that little of the time is spent on websockets. I'm speculating that an app using wsproto heavily might see 5 or 10% regression. A simple benchmark was added recently (#150), so perhaps worth a comparison with backport to 0.14.1. |
I couldn't reproduce the regression with whatever bench/connection.py is measuring-- it shows 0.14.1 as significantly slower. using 500 iterations and a few hacks for 0.14 compatibility:
|
well, it's because my app has small message sizes filed #152 |
wsaccel doesn't appear to be maintained, and has broken build under pypy
This article describes a pure Python mask implementation which is near the performance of wsaccel:
https://www.willmcgugan.com/blog/tech/post/speeding-up-websockets-60x/
The text was updated successfully, but these errors were encountered: