Skip to content

GzipFile.readinto reads full file before copying into the provided buffer #128646

Closed
@effigies

Description

@effigies

Bug report

Bug description:

gzip.GzipFile uses the BufferedIOBase implementation of .readinto(), which simply calls .read and copies the result into a buffer. This negates the purpose of using .readinto() at all.

This may be considered more a missed optimization than a bug, but it is being reported in downstream tools and I've traced it back to CPython.

import os
from gzip import GzipFile

n_mbs = 50

with GzipFile('test.gz', mode='wb') as fobj:
    for _ in range(n_mbs):
        fobj.write(os.urandom(2**20))

buffer = bytearray(n_mbs * 2**20)

with GzipFile('test.gz', mode='rb') as fobj:
    fobj.readinto(buffer)
memray load_file.py
memray flamegraph memray-*.bin && rm memray-*.bin

Current memory profile

image

Duration: 0:00:01.821000
Total number of allocations: 5064
Total number of frames seen: 85
Peak memory usage: 116.3 MiB
Python allocator: pymalloc

Patched memory profile

image

Duration: 0:00:01.828000
Total number of allocations: 3317
Total number of frames seen: 79
Peak memory usage: 66.2 MiB
Python allocator: pymalloc

Patch

diff --git a/Lib/gzip.py b/Lib/gzip.py
index 1a3c82ce7e0..21bb4b085fd 100644
--- a/Lib/gzip.py
+++ b/Lib/gzip.py
@@ -338,6 +338,20 @@ def read1(self, size=-1):
             size = io.DEFAULT_BUFFER_SIZE
         return self._buffer.read1(size)
 
+    def readinto(self, b):
+        self._check_not_closed()
+        if self.mode != READ:
+            import errno
+            raise OSError(errno.EBADF, "readinto() on write-only GzipFile object")
+        return self._buffer.readinto(b)
+
+    def readinto1(self, b):
+        self._check_not_closed()
+        if self.mode != READ:
+            import errno
+            raise OSError(errno.EBADF, "readinto1() on write-only GzipFile object")
+        return self._buffer.readinto1(b)
+
     def peek(self, n):
         self._check_not_closed()
         if self.mode != READ:

I believe this should be an uncontroversial patch, so I will open a PR immediately.

cc @psadil

CPython versions tested on:

3.9, 3.10, 3.11, 3.12, 3.13, CPython main branch

Operating systems tested on:

Linux

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibPython modules in the Lib dirtype-bugAn unexpected behavior, bug, or error

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions