Skip to content

[c-s][3.x] driver shutdown stuck #258

Open
@fruch

Description

@fruch

Issue description

  • This issue is a regression.
  • It is unknown if this issue is a regression.

During the end of the stress command one node is gone (10.0.1.17), and this driver shutdown seems to be stuck

total,       4425734,   19327,   19327,   19327,     4.1,     3.9,     6.9,     9.9,    24.0,    58.8,  325.0,  0.03100,      0,      0,       0,       0,       0,       0
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Connection has been closed
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Connection has been closed
total,       4508016,   16456,   16456,   16456,     4.8,     4.0,     9.6,    19.7,    91.6,   218.0,  330.0,  0.03065,      0,      0,       0,       0,       0,       0
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Connection has been closed
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Connection has been closed
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Connection has been closed
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Connection has been closed
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Connection has been closed
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Channel has been closed
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Channel has been closed
total,       4583006,   14998,   14998,   14998,     5.3,     4.5,    11.2,    18.1,    31.7,    62.0,  335.0,  0.03018,      0,      0,       0,       0,       0,       0
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Channel has been closed
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Connection has been closed
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Connection has been closed
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Connection has been closed
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Connection has been closed
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing
total,       4677544,   18908,   18908,   18908,     4.2,     4.0,     6.4,     9.0,    14.4,    20.2,  340.0,  0.03007,      0,      0,       0,       0,       0,       0
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Connection has been closed
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Connection has been closed
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing
total,       4780430,   20577,   20577,   20577,     3.9,     3.7,     6.6,     8.5,    13.4,    16.5,  345.0,  0.03022,      0,      0,       0,       0,       0,       0
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing
total,       4882713,   20457,   20457,   20457,     3.9,     3.7,     6.6,     9.1,    14.4,    26.8,  350.0,  0.03032,      0,      0,       0,       0,       0,       0
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing
total,       4985711,   20600,   20600,   20600,     3.9,     3.7,     6.6,     9.0,    13.9,    23.9,  355.0,  0.03040,      0,      0,       0,       0,       0,       0
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing
total,       5048570,   20374,   20374,   20374,     3.9,     3.7,     6.7,     9.2,    19.2,    24.3,  358.1,  0.03041,      0,      0,       0,       0,       0,       0
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing

com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing

com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Error writing
Results:
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Connection has been closed
Op rate                   :   14,099 op/s  [WRITE: 14,099 op/s]
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Connection has been closed
Partition rate            :   14,099 pk/s  [WRITE: 14,099 pk/s]
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Connection has been closed
Row rate                  :   14,099 row/s [WRITE: 14,099 row/s]
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Connection has been closed
Latency mean              :    5.6 ms [WRITE: 5.6 ms]
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Connection has been closed
Latency median            :    4.8 ms [WRITE: 4.8 ms]
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Connection has been closed
Latency 95th percentile   :   11.6 ms [WRITE: 11.6 ms]
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Connection has been closed
Latency 99th percentile   :   20.7 ms [WRITE: 20.7 ms]
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Connection has been closed
Latency 99.9th percentile :   41.6 ms [WRITE: 41.6 ms]
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Connection has been closed
Latency max               :  365.7 ms [WRITE: 365.7 ms]
com.datastax.driver.core.exceptions.TransportException: [ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042] Connection has been closed
Total partitions          :  5,048,570 [WRITE: 5,048,570]
WARN  00:44:50,156 Error creating netty channel to ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042
Total errors              :          0 [WRITE: 0]
com.datastax.shaded.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: ip-10-0-1-17.eu-north-1.compute.internal/10.0.1.17:9042
Total GC count            : 0
Caused by: java.net.ConnectException: Connection refused
Total GC memory           : 0.000 KiB
	at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
Total GC time             :    0.0 seconds
	at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:777)
Avg GC time               :    NaN ms
	at com.datastax.shaded.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:337)
StdDev GC time            :    0.0 ms
	at com.datastax.shaded.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
Total operation time      : 00:05:58
	at com.datastax.shaded.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:776)

	at com.datastax.shaded.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
END

Impact

Stress command is getting stuck

How frequently does it reproduce?

seen once so far on the nightly runs

Installation details

Kernel Version: 5.15.0-1049-aws
Scylla version (or git commit hash): 5.5.0~dev-20231113.7b08886e8dd8 with build-id 7548c48606c5b58f282fc2b596019226de0df6ed

Cluster size: 3 nodes (i4i.large)

Scylla Nodes used in this run:

  • longevity-5gb-1h-NodeTerminateAndRe-db-node-dfb19859-9 (13.49.80.230 | 10.0.1.175) (shards: 2)
  • longevity-5gb-1h-NodeTerminateAndRe-db-node-dfb19859-8 (51.20.109.226 | 10.0.3.74) (shards: 2)
  • longevity-5gb-1h-NodeTerminateAndRe-db-node-dfb19859-7 (16.16.201.232 | 10.0.1.202) (shards: 2)
  • longevity-5gb-1h-NodeTerminateAndRe-db-node-dfb19859-6 (13.51.48.183 | 10.0.1.141) (shards: 2)
  • longevity-5gb-1h-NodeTerminateAndRe-db-node-dfb19859-5 (51.20.123.34 | 10.0.2.227) (shards: 2)
  • longevity-5gb-1h-NodeTerminateAndRe-db-node-dfb19859-4 (13.51.178.61 | 10.0.1.211) (shards: 2)
  • longevity-5gb-1h-NodeTerminateAndRe-db-node-dfb19859-3 (16.171.236.237 | 10.0.1.17) (shards: 2)
  • longevity-5gb-1h-NodeTerminateAndRe-db-node-dfb19859-2 (51.20.144.28 | 10.0.1.61) (shards: 2)
  • longevity-5gb-1h-NodeTerminateAndRe-db-node-dfb19859-17 (16.171.239.81 | 10.0.2.188) (shards: 2)
  • longevity-5gb-1h-NodeTerminateAndRe-db-node-dfb19859-16 (16.171.54.15 | 10.0.1.41) (shards: 2)
  • longevity-5gb-1h-NodeTerminateAndRe-db-node-dfb19859-15 (51.20.143.39 | 10.0.3.169) (shards: 2)
  • longevity-5gb-1h-NodeTerminateAndRe-db-node-dfb19859-14 (16.16.64.126 | 10.0.0.194) (shards: 2)
  • longevity-5gb-1h-NodeTerminateAndRe-db-node-dfb19859-13 (16.170.157.28 | 10.0.2.96) (shards: 2)
  • longevity-5gb-1h-NodeTerminateAndRe-db-node-dfb19859-12 (51.20.66.232 | 10.0.2.150) (shards: 2)
  • longevity-5gb-1h-NodeTerminateAndRe-db-node-dfb19859-11 (13.48.26.181 | 10.0.1.121) (shards: 2)
  • longevity-5gb-1h-NodeTerminateAndRe-db-node-dfb19859-10 (51.20.106.233 | 10.0.1.211) (shards: 2)
  • longevity-5gb-1h-NodeTerminateAndRe-db-node-dfb19859-1 (16.171.110.198 | 10.0.2.74) (shards: 2)

OS / Image: ami-02ea62bee686ef03b (aws: undefined_region)

Test: longevity-5gb-1h-NodeTerminateAndReplace-aws-test
Test id: dfb19859-9fbb-4408-a2be-a42c7400f7cb
Test name: scylla-master/nemesis/longevity-5gb-1h-NodeTerminateAndReplace-aws-test
Test config file(s):

Logs and commands
  • Restore Monitor Stack command: $ hydra investigate show-monitor dfb19859-9fbb-4408-a2be-a42c7400f7cb
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs dfb19859-9fbb-4408-a2be-a42c7400f7cb

Logs:

Jenkins job URL
Argus

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions