You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was recently migrating some code from requests to httpx, and came across some pitfalls related to streaming large streams. I need to be able to send BinaryIO objects raw over a connection somewhere, and critically I need to set a Content-Length on the messages I send.
In requests, this is possible to hack by (ab)using the super_len function, which tries to infer the length of the stream by guessing at len, fileno, etc. httpx does something similar, in its peek_filelike_length method, except it doesn't try to guess at len, only fileno.
Like requests, this falls back on tell and seek if this fails, which is where my issue happens. Not all streams are harmlessly seekable, so if you naively pass along a stream that isn't an actual file handle to the files argument, you risk loading the entire stream, which in my case could be hundreds of gigabytes of data.
In the end I ended up creating a Request manually and using the stream argument, which isn't exposed anywhere else from what I can tell. The ergonomics of this is also not great, since I then have to manually set all headers.
Personally I really dislike the whole "try to guess the file length" business, but that is what it is. There are a few things that could be done to greatly improve ergonomics here:
Do not set or try to guess Content-Length or Transfer-Encoding if either are specified by the user. This one seems like a no-brainer to me, especially the AsyncIterable case in encode_content. If the user knows the content length, there is no real reason to set Transfer-Encoding: chunked or to try to guess the length.
Create a protocol type or something else you can pass as RequestContent or somewhere else that encapsulates a "stream with length".
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I was recently migrating some code from requests to httpx, and came across some pitfalls related to streaming large streams. I need to be able to send
BinaryIOobjects raw over a connection somewhere, and critically I need to set aContent-Lengthon the messages I send.In requests, this is possible to hack by (ab)using the
super_lenfunction, which tries to infer the length of the stream by guessing atlen,fileno, etc. httpx does something similar, in itspeek_filelike_lengthmethod, except it doesn't try to guess atlen, onlyfileno.Like requests, this falls back on
tellandseekif this fails, which is where my issue happens. Not all streams are harmlessly seekable, so if you naively pass along a stream that isn't an actual file handle to thefilesargument, you risk loading the entire stream, which in my case could be hundreds of gigabytes of data.In the end I ended up creating a
Requestmanually and using thestreamargument, which isn't exposed anywhere else from what I can tell. The ergonomics of this is also not great, since I then have to manually set all headers.Personally I really dislike the whole "try to guess the file length" business, but that is what it is. There are a few things that could be done to greatly improve ergonomics here:
Content-LengthorTransfer-Encodingif either are specified by the user. This one seems like a no-brainer to me, especially theAsyncIterablecase inencode_content. If the user knows the content length, there is no real reason to setTransfer-Encoding: chunkedor to try to guess the length.RequestContentor somewhere else that encapsulates a "stream with length".Beta Was this translation helpful? Give feedback.
All reactions