-
-
Notifications
You must be signed in to change notification settings - Fork 31.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GzipFile.readinto reads full file before copying into the provided buffer #128646
Comments
To expand on the above in hopes of saving some effort for a reviewer, here is my trace of calls:
These are the Lines 694 to 732 in ec91e1c
cpython/Modules/_io/bufferedio.c Lines 48 to 109 in ec91e1c
These call Lines 206 to 210 in ec797d1
Lines 321 to 339 in ec797d1
Therefore, we are forcing Lines 1144 to 1190 in ec91e1c
cpython/Modules/_io/bufferedio.c Lines 1076 to 1184 in ec91e1c
This implementation avoids growing the internal buffer beyond an individual chunk. |
Following the advice in https://devguide.python.org/getting-started/pull-request-lifecycle/, I'm pinging this issue after ~1mo to see if any reviewers have time to look at this. I'm not sure if this is enough of a bug to get backported, so I'd like to make sure that it gets resolved in 3.14. |
The new methods simply delegate to the underlying buffer, much like the existing GzipFile.read[1] methods. This avoids extra allocations caused by the BufferedIOBase.readinto implementation previously used. This commit also factors out a common readability check rather than copying it an additional two times.
Closed by #128647. |
Bug report
Bug description:
gzip.GzipFile uses the BufferedIOBase implementation of
.readinto()
, which simply calls.read
and copies the result into a buffer. This negates the purpose of using.readinto()
at all.This may be considered more a missed optimization than a bug, but it is being reported in downstream tools and I've traced it back to CPython.
Current memory profile
Duration: 0:00:01.821000
Total number of allocations: 5064
Total number of frames seen: 85
Peak memory usage: 116.3 MiB
Python allocator: pymalloc
Patched memory profile
Duration: 0:00:01.828000
Total number of allocations: 3317
Total number of frames seen: 79
Peak memory usage: 66.2 MiB
Python allocator: pymalloc
Patch
I believe this should be an uncontroversial patch, so I will open a PR immediately.
cc @psadil
CPython versions tested on:
3.9, 3.10, 3.11, 3.12, 3.13, CPython main branch
Operating systems tested on:
Linux
Linked PRs
The text was updated successfully, but these errors were encountered: