-
-
Notifications
You must be signed in to change notification settings - Fork 320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FSSpecStore
: unclosed aiohttp resources when using gcsfs
#2674
Comments
It may be useful to turn on the "gcsfs" logger, to see if multiple filesystem instances are being made or similar, and what exact calls are happening. |
how would I do this in that code example? |
I usually do |
here's a gist with the logging output, mostly i'm seeing a huge pile of |
these apparently identical requests at the top are bit odd, not sure if that's a clue.
|
Probably these are file listings; they should be cached unless they have a limit on the number of items to return. Perhaps they are launched from multiple threads at the same time and so defeating caching. So these are, I suppose checking "directory"-ness, which for a bucket it always true. That's a bug causing redundant work, but not the unclosed connections. So, some thoughts:
|
I dropped a breakpoint in |
digging a bit more into it, I think the logic of In my example, def close_session(loop, session):
if loop is not None and session is not None: # <---- If this conjunction is not true, then this function does nothing
if loop.is_running():
try:
current_loop = asyncio.get_running_loop()
current_loop.create_task(session.close())
return
except RuntimeError:
pass
try:
asyn.sync(loop, session.close, timeout=0.1)
except fsspec.FSTimeoutError:
pass
else:
pass |
Perhaps can you try with fsspec/gcsfs#657 ? |
this works like a charm, thanks @martindurant # /// script
# requires-python = ">=3.10"
# dependencies = [
# "fsspec",
# "gcsfs",
# ]
#
# [tool.uv.sources]
# fsspec = { git = "https://github.com/martindurant/gcsfs.git@better_shutdown" }
# ///
from zarr import open_group
from time import time
from zarr import config as zarr_config
# fsspec.utils.setup_logging(logger_name="gcsfs")
def test_list_members() -> None:
z = open_group(
'gs://gcp-public-data-arco-era5/ar/full_37-1h-0p25deg-chunk-1.zarr-v3',
mode='r',
storage_options = {'token': 'anon'},
use_consolidated=False)
t = time()
with zarr_config.set({'async.concurrency': 16}):
members = z.members()
elapsed = time() - t
print(f'Discovered {len(members)} members in {elapsed:0.2f}s')
if __name__ == '__main__':
test_list_members() |
this should be fixed by fsspec/gcsfs#657, which will be part of a |
Having a similar issue, but not sure if it's related (since this is using running this script: import zarr
URL = "https://s3.embl.de/i2k-2020/ngff-example-data/v0.4/tczyx.ome.zarr"
zarr_arr = zarr.open(URL, mode="r") gives this on exit: Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x101948200>
Unclosed connector
connections: ['deque([(<aiohttp.client_proto.ResponseHandler object at 0x1072e3d10>, 1066999.872162625), (<aiohttp.client_proto.ResponseHandler object at 0x107554050>, 1066999.872625166), (<aiohttp.client_proto.ResponseHandler object at 0x1072e3ef0>, 1066999.87292075), (<aiohttp.client_proto.ResponseHandler object at 0x1075547d0>, 1067000.058575958)])']
connector: <aiohttp.connector.TCPConnector object at 0x1072d8050> (which in pytest, with warnings as errors raises a huge long list of ResourceWarning)
should I open an independent issue? Or is this a usage error now (I didn't see any mention of a context manager I should use in the migration guide) |
I wonder if the fix @martindurant put together in fsspec/gcsfs#657 also needs to be applied to HTTPFilesystem? |
Yes, I would say the cleanup behaviour for HTTP should be the same as for gcsfs |
invoking this script:
produces the following output. I truncated the list of unclosed connections, because it's rather long:
Since
zarr-python
doesn't import anything from aiohttp, I'm pretty sure this is down togcsfs
. TheGCSFileSystem.close_session
method caught my eye, but invoking it e.g. viastore.fs.close_session(None, None)
didn't resolve the unclosed connections warnings. @martindurant any recommendations for making these warnings go away (and ideally also closing the connections)?The text was updated successfully, but these errors were encountered: