Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI failures caused by ValueError: fd <XYZ> added twice #1659

Open
hendrikmakait opened this issue Feb 3, 2025 · 0 comments
Open

CI failures caused by ValueError: fd <XYZ> added twice #1659

hendrikmakait opened this issue Feb 3, 2025 · 0 comments

Comments

@hendrikmakait
Copy link
Member

https://cloud.coiled.io/clusters/749266/account/dask-benchmarks/information?workspace=dask-benchmarks (Q15)

(10.0.31.66)    2025-02-02 00:40:21.215000 tornado.application - ERROR - Error in connection callback
  Traceback (most recent call last):
    File "/opt/coiled/env/lib/python3.10/site-packages/tornado/tcpserver.py", line 372, in _handle_connection
      stream = SSLIOStream(
    File "/opt/coiled/env/lib/python3.10/site-packages/tornado/iostream.py", line 1354, in __init__
      self._add_io_state(self.io_loop.WRITE)
    File "/opt/coiled/env/lib/python3.10/site-packages/tornado/iostream.py", line 1039, in _add_io_state
      self.io_loop.add_handler(self.fileno(), self._handle_events, self._state)
    File "/opt/coiled/env/lib/python3.10/site-packages/tornado/platform/asyncio.py", line 160, in add_handler
      raise ValueError("fd %s added twice" % fd)
  ValueError: fd 29 added twice
(10.0.18.234)   2025-02-02 00:40:21.453000 distributed.worker - ERROR - Compute Failed
(10.0.18.234)   2025-02-02 00:40:21.454000 Key:       ('shuffle-transfer-15e68a406d069397d66b386bfb7cb75f', 442)
  State:     executing
  Task:  <Task ('shuffle-transfer-15e68a406d069397d66b386bfb7cb75f', 442) _shuffle_transfer(...)>
  Exception: "RuntimeError('P2P 15e68a406d069397d66b386bfb7cb75f failed during transfer phase')"
  Traceback: '  File "/opt/coiled/env/lib/python3.10/site-packages/dask/dataframe/dask_expr/_shuffle.py", line 548, in _shuffle_transfer\n    return
shuffle_transfer(\n  File "/opt/coiled/env/lib/python3.10/site-packages/distributed/shuffle/_shuffle.py", line 56, in shuffle_transfer\n    with
handle_transfer_errors(id):\n  File "/opt/coiled/env/lib/python3.10/contextlib.py", line 153, in __exit__\n    self.gen.throw(typ, value, traceback)\n  File
"/opt/coiled/env/lib/python3.10/site-packages/distributed/shuffle/_core.py", line 531, in handle_transfer_errors\n    raise RuntimeError(f"P2P {id} failed
during transfer phase") from e\n'

https://cloud.coiled.io/clusters/731091/account/dask-benchmarks/information?workspace=dask-benchmarks (Q05)

(10.0.31.252)   2025-01-19 00:30:27.331000 distributed.worker - ERROR - Compute Failed
  Key:       ('hashjoinp2p-a4b4b932a0fe827d8a58fc1c089051e8', 614)
  State:     executing
  Task:  <Task ('hashjoinp2p-a4b4b932a0fe827d8a58fc1c089051e8', 614) merge_unpack(...)>
  Exception: "ValueError('fd 40 added twice')"
  Traceback: '  File "/opt/coiled/env/lib/python3.10/site-packages/distributed/shuffle/_merge.py", line 53, in merge_unpack\n    left =
ext.get_output_partition(shuffle_id_left, barrier_left, output_partition)\n  File
"/opt/coiled/env/lib/python3.10/site-packages/distributed/shuffle/_worker_plugin.py", line 432, in get_output_partition\n    return
shuffle_run.get_output_partition(\n  File "/opt/coiled/env/lib/python3.10/site-packages/distributed/shuffle/_core.py", line 381, in get_output_partition\n
sync(self._loop, self._ensure_output_worker, partition_id, key)\n  File "/opt/coiled/env/lib/python3.10/site-packages/distributed/utils.py", line 439, in
sync\n    raise error\n  File "/opt/coiled/env/lib/python3.10/site-packages/distributed/utils.py", line 413, in f\n    result = yield future\n  File
"/opt/coiled/env/lib/python3.10/site-packages/tornado/gen.py", line 766, in run\n    value = future.result()\n  File
"/opt/coiled/env/lib/python3.10/site-packages/distributed/shuffle/_core.py", line 338, in _ensure_output_worker\n    result = await
self.scheduler.shuffle_restrict_task(\n  File "/opt/coiled/env/lib/python3.10/site-packages/distributed/core.py", line 1256, in send_recv_from_rpc\n    comm
= await self.pool.connect(self.addr)\n  File "/opt/coiled/env/lib/python3.10/site-packages/distributed/core.py", line 1539, in connect\n    return
connect_attempt.result()\n  File "/opt/coiled/env/lib/python3.10/site-packages/distributed/core.py", line 1429, in _connect\n    comm = await connect(\n
File "/opt/coiled/env/lib/python3.10/site-packages/distributed/comm/core.py", line 342, in connect\n    comm = await wait_for(\n  File
"/opt/coiled/env/lib/python3.10/site-packages/distributed/utils.py", line 1915, in wait_for\n    return await asyncio.wait_for(fut, timeout)\n  File
"/opt/coiled/env/lib/python3.10/asyncio/tasks.py", line 445, in wait_for\n    return fut.result()\n  File
"/opt/coiled/env/lib/python3.10/site-packages/distributed/comm/tcp.py", line 547, in connect\n    stream = await self.client.connect(\n  File
"/opt/coiled/env/lib/python3.10/site-packages/tornado/tcpclient.py", line 279, in connect\n    af, addr, stream = await
connector.start(connect_timeout=timeout)\n  File "/opt/coiled/env/lib/python3.10/site-packages/tornado/tcpclient.py", line 109, in start\n
self.try_connect(iter(self.primary_addrs))\n  File "/opt/coiled/env/lib/python3.10/site-packages/tornado/tcpclient.py", line 127, in try_connect\n
stream, future = self.connect(af, addr)\n  File "/opt/coiled/env/lib/python3.10/site-packages/tornado/tcpclient.py", line 332, in _create_stream\n    return
stream, stream.connect(addr)\n  File "/opt/coiled/env/lib/python3.10/site-packages/tornado/iostream.py", line 1195, in connect\n
self._add_io_state(self.io_loop.WRITE)\n  File "/opt/coiled/env/lib/python3.10/site-packages/tornado/iostream.py", line 1039, in _add_io_state\n
self.io_loop.add_handler(self.fileno(), self._handle_events, self._state)\n  File
"/opt/coiled/env/lib/python3.10/site-packages/tornado/platform/asyncio.py", line 160, in add_handler\n    raise ValueError("fd %s added twice" % fd)\n'
This was referenced Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant