-
Notifications
You must be signed in to change notification settings - Fork 438
Description
Environment
- asyncpg version: 0.31.0 (also reproduced on 0.30.x)
- PostgreSQL version: 16
- Python version: 3.11.14
- Platform: Linux (Kubernetes)
- pgbouncer: No
- SQLAlchemy: 2.0.23
Summary
When an asyncpg operation is cancelled via asyncio.CancelledError while mid-query, the cancellation mechanism in connect_utils._cancel can raise a built-in ConnectionError that escapes to the caller. This is problematic because:
- Callers (e.g. SQLAlchemy) expect asyncpg-specific exception types and don't handle built-in
ConnectionError - The cancel operation is inherently best-effort — if the cancel connection fails, the error should be suppressed or wrapped, not propagated
This is related to #1211 but occurs on non-direct_tls connections via the cancel request code path.
Reproduction flow
- An asyncpg connection is executing a query (e.g. inside SQLAlchemy's
session.execute()) - The asyncio task is cancelled (
task.cancel()) CancelledErrorpropagates intoprotocol.query()/bind_execute- asyncpg's cancellation handler tries to send a PostgreSQL cancel request by opening a new SSL connection via
connect_utils._cancel→_create_ssl_connection - The new connection fails (server already closed the original, or network issue)
TLSUpgradeProto.connection_lost()raises built-inConnectionError('unexpected connection_lost() call')- This escapes through
connect_utils._cancel(which has no error handling around_create_ssl_connection) - Caller receives
ConnectionErrorinstead ofCancelledError
Traceback
asyncio.exceptions.CancelledError (original exception)
During handling of the above exception, another exception occurred:
File "asyncpg/transaction.py", line 206, in __rollback
await self._connection.execute(query)
File "asyncpg/connection.py", line 350, in execute
result = await self._protocol.query(query, timeout)
File "asyncpg/connection.py", line 1584, in _cancel
await connect_utils._cancel(
File "asyncpg/connect_utils.py", line 1040, in _cancel
tr, pr = await _create_ssl_connection(
File "asyncpg/connect_utils.py", line 752, in _create_ssl_connection
do_ssl_upgrade = await pr.on_data
^^^^^^^^^^^^^^^^
ConnectionError: unexpected connection_lost() call
Root cause
Two issues in connect_utils.py:
1. _cancel() has no error handling around _create_ssl_connection
async def _cancel(*, loop, addr, params, backend_pid, backend_secret):
...
if params.ssl and params.sslmode != SSLMode.allow:
tr, pr = await _create_ssl_connection(...) # ← no try/except!
...The cancel request is best-effort (we're telling PostgreSQL to cancel a query on a connection that may already be dead). If opening the cancel connection fails, the error should be suppressed or wrapped in asyncpg.InterfaceError, not propagated as a raw ConnectionError.
2. TLSUpgradeProto.connection_lost() raises built-in ConnectionError
def connection_lost(self, exc):
if not self.on_data.done():
if exc is None:
exc = ConnectionError('unexpected connection_lost() call')
self.on_data.set_exception(exc)This raises a built-in Python ConnectionError, not an asyncpg exception type. Callers like SQLAlchemy check for asyncpg.InterfaceError or asyncpg.PostgresError to detect disconnects. A built-in ConnectionError bypasses all those checks, which means:
- SQLAlchemy's
is_disconnect()doesn't recognize it - SQLAlchemy's pool pre-ping handler (
_do_ping_w_event) only catchesself.loaded_dbapi.Error, soConnectionErrorescapes - The pool's retry logic (which would create a fresh connection) never triggers
Suggested fix
Option A (minimal): Catch OSError (parent of ConnectionError) in connect_utils._cancel() and suppress it — cancel is best-effort:
async def _cancel(*, loop, addr, params, backend_pid, backend_secret):
...
try:
if params.ssl and params.sslmode != SSLMode.allow:
tr, pr = await _create_ssl_connection(...)
...
except OSError:
# Cancel is best-effort. If we can't reach the server, the
# connection is dead anyway.
returnOption B (comprehensive): Also change TLSUpgradeProto.connection_lost() to raise asyncpg.InterfaceError instead of built-in ConnectionError, so callers can handle it consistently:
def connection_lost(self, exc):
if not self.on_data.done():
if exc is None:
exc = InterfaceError('unexpected connection_lost() call')
self.on_data.set_exception(exc)Impact
This causes process crashes in production services. When a task is cancelled during a DB query, the ConnectionError escapes all exception handlers (which expect either CancelledError or asyncpg-specific exceptions) and terminates the process.
This is 100% correlated with CancelledError in our logs — every ConnectionError: unexpected connection_lost() we've seen is triggered by task cancellation.
Additional context
We use Google CloudSQL with SSL connections. The PostgreSQL server is accessed over SSL (non-direct_tls), which means the cancel code path goes through _create_ssl_connection to establish a new SSL connection for sending the cancel request.