Description
Bug report
Bug description:
gzip.GzipFile uses the BufferedIOBase implementation of .readinto()
, which simply calls .read
and copies the result into a buffer. This negates the purpose of using .readinto()
at all.
This may be considered more a missed optimization than a bug, but it is being reported in downstream tools and I've traced it back to CPython.
import os
from gzip import GzipFile
n_mbs = 50
with GzipFile('test.gz', mode='wb') as fobj:
for _ in range(n_mbs):
fobj.write(os.urandom(2**20))
buffer = bytearray(n_mbs * 2**20)
with GzipFile('test.gz', mode='rb') as fobj:
fobj.readinto(buffer)
memray load_file.py
memray flamegraph memray-*.bin && rm memray-*.bin
Current memory profile
Duration: 0:00:01.821000
Total number of allocations: 5064
Total number of frames seen: 85
Peak memory usage: 116.3 MiB
Python allocator: pymalloc
Patched memory profile
Duration: 0:00:01.828000
Total number of allocations: 3317
Total number of frames seen: 79
Peak memory usage: 66.2 MiB
Python allocator: pymalloc
Patch
diff --git a/Lib/gzip.py b/Lib/gzip.py
index 1a3c82ce7e0..21bb4b085fd 100644
--- a/Lib/gzip.py
+++ b/Lib/gzip.py
@@ -338,6 +338,20 @@ def read1(self, size=-1):
size = io.DEFAULT_BUFFER_SIZE
return self._buffer.read1(size)
+ def readinto(self, b):
+ self._check_not_closed()
+ if self.mode != READ:
+ import errno
+ raise OSError(errno.EBADF, "readinto() on write-only GzipFile object")
+ return self._buffer.readinto(b)
+
+ def readinto1(self, b):
+ self._check_not_closed()
+ if self.mode != READ:
+ import errno
+ raise OSError(errno.EBADF, "readinto1() on write-only GzipFile object")
+ return self._buffer.readinto1(b)
+
def peek(self, n):
self._check_not_closed()
if self.mode != READ:
I believe this should be an uncontroversial patch, so I will open a PR immediately.
cc @psadil
CPython versions tested on:
3.9, 3.10, 3.11, 3.12, 3.13, CPython main branch
Operating systems tested on:
Linux
Linked PRs
Metadata
Metadata
Assignees
Projects
Status