Description
Observed in https://github.com/scylladb/scylla-enterprise/pull/4634#issuecomment-2333883650.
The test is running with force_gossip_topology_changes: true
, so auth is not managed via raft and auth data is stored in system_auth
keyspace with default replication factor 1. Test fails once per several runs.
It is doing rolling upgrade but sometimes the driver is not connected to some of the nodes after the rolling upgrade is finished (all nodes are up).
Reproducer:
@pytest.mark.asyncio
async def test_rolling_restart_with_auth(manager: ManagerClient):
config = {
'force_gossip_topology_changes': True,
}
servers = [await manager.server_add(config=config) for _ in range(3)]
cql = manager.get_cql()
hosts = await wait_for_cql_and_get_hosts(cql, servers, time.time() + 60)
await manager.rolling_restart(servers)
I was running the reproducer in test/auth_cluster
suite (enabled authentication) https://github.com/scylladb/scylladb/blob/master/test/auth_cluster/suite.yaml
During the upgrade, the driver cannot authenticate if replica which owns the part of token ring holding user data (system_auth
has RF=1) is down. But it isn't reconnected after the node gets up.