-
Notifications
You must be signed in to change notification settings - Fork 49
Add shard connection backoff policy #473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add shard connection backoff policy #473
Conversation
0b80886
to
f62dfa3
Compare
dbb3ad1
to
cbb4719
Compare
Shouldn't we have some warning / info level log when backoff is taking place? |
I would rather not do it, it is not useful and can potentially pollute the log |
Do you know what caused the test failure?
it is a unit test that at the first glance should be fully deterministic. Failure is unexpected. |
It is known issue, conversion goes wrong somewhere |
a43ccd1
to
b0fd069
Compare
f47313f
to
9dfd9ec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General comment: integration tests for new policies are definitely needed here.
aebc540
to
61668de
Compare
The patchset lacks documentation, which would have helped to understand the feature and when/how to use it. Is documentation a separate repo / commit? |
@Lorak-mmk , done, all comments addressed please take a look |
40dc7b6
to
3d97ecd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks much better now, especially documentation-wise!
It would be good to describe this new policy in docs/ if we want people to use it.
Before merging it would be great to run some real-world scenario and see if new policy can help with cluster overload. Is that something that could be done with SCT?
Note: I did not yet read "LimitedConcurrencyShardConnectionBackoffPolicy". I'll have a few more comments there.
@abstractmethod | ||
def schedule( | ||
self, | ||
host_id: str, | ||
shard_id: int, | ||
method: Callable[[], None], | ||
) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will be shard_id
for C* clusters? Will it be set to 0, or will be (contrary to type hint), a None?
Could you point me to the place in the code responsible for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API works only for scylla, when sharding information is present, in rest of the cases it is not used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
List of places where it is called:
python-driver/cassandra/pool.py
Lines 488 to 489 in a83038c
self._session.shard_connection_backoff_scheduler.schedule( | |
self.host.host_id, shard_id, partial(self._open_connection_to_missing_shard, shard_id)) |
python-driver/cassandra/pool.py
Lines 499 to 500 in a83038c
self._session.shard_connection_backoff_scheduler.schedule( | |
self.host.host_id, shard_id, partial(self._open_connection_to_missing_shard, shard_id)) |
python-driver/cassandra/pool.py
Lines 610 to 611 in a83038c
self._session.shard_connection_backoff_scheduler.schedule( | |
self.host.host_id, connection.features.shard_id, partial(self._open_connection_to_missing_shard, connection.features.shard_id)) |
python-driver/cassandra/pool.py
Lines 853 to 854 in a83038c
self._session.shard_connection_backoff_scheduler.schedule( | |
self.host.host_id, shard_id, partial(self._open_connection_to_missing_shard, shard_id)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh ok, I was not aware that opening shard-aware vs non-shard-aware connections is so different in the driver.
3d97ecd
to
06f19e3
Compare
Commit introduces two abstract classes: 1. `ShardConnectionBackoffPolicy` - a base class for policy that controls pase of shard connections creation 2. Auxiliary `ShardConnectionScheduler` - a scheduler that is instatiated by `ShardConnectionBackoffPolicy` at session initialization
This policy is implementation of ShardConnectionBackoffPolicy. It implements same behavior that driver currently has: 1. No delay between creating shard connections 2. It avoids creating multiple connections to same host_id, shard_id
This is required by upcoming LimitedConcurrencyShardConnectionBackoffPolicy.
There is no reason to accept schedule requests when cluster is shutting down.
Add code that integrates ShardConnectionBackoffPolicy into: 1. Cluster 2. Session 3. HostConnection Main idea is to put ShardConnectionBackoffPolicy in control of shard connection creation proccess. Removing duplicate logic from HostConnection that tracks pending connection creation requests.
06f19e3
to
f71e7c9
Compare
Done, added section to
There is no python loader there, but we can emulate this issue locally, no need to run it on cloud, only difference is to overload real cluster you need way more connections. |
41b5ea8
to
088053b
Compare
cassandra/policies.py
Outdated
return schedule, next(schedule) | ||
except StopIteration: | ||
# A bit of trickery to avoid having lock around self.schedule | ||
schedule = self.backoff_policy.new_schedule() | ||
delay = next(schedule) | ||
self.schedule = schedule | ||
return schedule, delay |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is self.schedule
? I see no field like this declared in the class, and it doesn't make conceptual sense (function takes schedule as arguments, but in case of error sets it on a field).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forgot to clean up, thanks.
cassandra/policies.py
Outdated
class LimitedConcurrencyShardConnectionBackoffPolicy(ShardConnectionBackoffPolicy): | ||
""" | ||
A shard connection backoff policy that allows only `max_concurrent` concurrent connections per `host_id`. | ||
|
||
For backoff calculation, it requires either a `cassandra.policies.ShardConnectionBackoffSchedule` or | ||
a `cassandra.policies.ReconnectionPolicy`, as both expose the same API. | ||
|
||
It spawns threads when there are pending requests, maximum number of threads is `max_concurrent` multiplied by nodes in the cluster. | ||
When thread is spawn it initiates backoff schedule, which is local for this thread. | ||
If there are no remaining requests for that `host_id`, thread is killed. | ||
|
||
This policy also prevents multiple pending or scheduled connections for the same (host, shard) pair; | ||
any duplicate attempts to schedule a connection are silently ignored. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this comment is saying about concurrent connections, and spawning threads. As far as I can tell, none of those things are happening here.
Scheduler we are using here for opening connections has 1 thread, so there is no concurrency happening.
The class does not spawn threads anywhere, so idk where this comment comes from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, it is a bit confusing, changed thread to worker, add more information to _ScopeBucket
_Scheduler
has one thread, but it does not run scheduled code, it uses cluster.executor
for that, which has 2 threads.
cassandra/policies.py
Outdated
class LimitedConcurrencyShardConnectionBackoffPolicy(ShardConnectionBackoffPolicy): | ||
""" | ||
A shard connection backoff policy that allows only `max_concurrent` concurrent connections per `host_id`. | ||
|
||
For backoff calculation, it requires either a `cassandra.policies.ShardConnectionBackoffSchedule` or | ||
a `cassandra.policies.ReconnectionPolicy`, as both expose the same API. | ||
|
||
It spawns threads when there are pending requests, maximum number of threads is `max_concurrent` multiplied by nodes in the cluster. | ||
When thread is spawn it initiates backoff schedule, which is local for this thread. | ||
If there are no remaining requests for that `host_id`, thread is killed. | ||
|
||
This policy also prevents multiple pending or scheduled connections for the same (host, shard) pair; | ||
any duplicate attempts to schedule a connection are silently ignored. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it is a bit worrying to me that we are now using executor thread for opening new connections.
It already has non-negligible work - handling events, control connections, schema fetches. This also causes all connection opening to be done serially.
How was it done before this PR. Was there a thread-per-connection? Thread-per-host? Just a single thread for everything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this regard it used to be done in the exact same way, all the connection creation requests was handled by cluster.executor
.
Only difference is that before items were submitted to executor right away, now they are waiting in Scheduler
queue according to the schedule
cassandra/policies.py
Outdated
def _run(self, schedule: Iterator[float]): | ||
if self.is_shutdown: | ||
return | ||
|
||
with self.lock: | ||
try: | ||
request = self.items.pop(0) | ||
except IndexError: | ||
# Just in case | ||
if self.currently_pending > 0: | ||
self.currently_pending -= 1 | ||
# When items are exhausted reset schedule to ensure that new items going to get another schedule | ||
# It is important for exponential policy | ||
return | ||
|
||
try: | ||
request() | ||
finally: | ||
schedule, delay = self._get_delay(schedule) | ||
self.scheduler.schedule(delay, self._run, schedule) | ||
|
||
def schedule_new_connection(self, cb: Callable[[], None]): | ||
with self.lock: | ||
if self.is_shutdown: | ||
return | ||
self.items.append(cb) | ||
if self.currently_pending < self.max_concurrent: | ||
self.currently_pending += 1 | ||
schedule = self.backoff_policy.new_schedule() | ||
delay = next(schedule) | ||
self.scheduler.schedule(delay, self._run, schedule) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so if I understand correctly, the "concurrency" here is how many pending scheduler.schedule
calls can there be. As far as I can tell, it doesn't do anything, since the executor is single thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not exactly, it is how many _run
, running or scheduled.
I have just renamed it to _worker_body
.
Executor is 2 threaded by default.
But even with 1 threaded executor, while connection is being created, yes it could not run anything, but when it is created and now, when it waits, other instance of _worker_body
could be handled, since it is not blocking executor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I understand those semantics, but I don't really understand how they are useful, what is the intended use case for this? This concurrency mostly means that the sleep times will be different (because there are many "workers"), which is more difficult to reason about than different backoff_policy.
@dkropachev please share test results with and without this feature? sidenote, Let's make sure we're focusing on the important things. |
This policy is an implementation of `ShardConnectionBackoffPolicy`. Its primary purpose is to prevent connection storms by imposing restrictions on the number of concurrent pending connections per host and backoff time between each connection attempt.
088053b
to
9482d67
Compare
Tests cover: 1. LimitedConcurrencyShardConnectionBackoffPolicy 2. NoDelayShardConnectionBackoffPolicy For both Scylla and Cassandra backend.
Sole goal of `ShardConnectionBackoffPolicy` existance is to fight connection storms. So, this commit adds connection storms section to `docs/faq.rst`
9482d67
to
0db57a7
Compare
|
||
from tests.integration import use_cluster, get_cluster, get_node, TestCluster | ||
|
||
|
||
def setup_module(): | ||
os.environ['SCYLLA_EXT_OPTS'] = "--smp 8" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gonna be problematic on github actions, I don't know if you'll enough resources...
even if it would flat fail, it might make those test in this module unstable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, test does not revert those changes after finishing, so it will affect other tests that run after. It should save previous value of this env, and restore it later.
return _LimitedConcurrencyShardConnectionScheduler(scheduler, self.backoff_policy, self.max_concurrent) | ||
|
||
|
||
class _ScopeBucket: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
niptick: the underscore kind of suggest we might use __all__
to show the classes that are public from this module.
as https://peps.python.org/pep-0008/#public-and-internal-interfaces indicate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
the only concern is the smp=8 on integration tests, that might introduce test instability
Introduce
ShardReconnectionPolicy
and its implementations:NoDelayShardConnectionBackoffPolicy
: no delay or concurrency limit, ensures at most one pending connection per host+shard.LimitedConcurrencyShardConnectionBackoffPolicy
: limits pending concurrent connections tomax_concurrent
per host with backoff between shard connections.The idea of this PR is to shift responsibility of scheduling
HostConnection._open_connection_to_missing_shard
fromHostConnection
toShardConnectionBackoffPolicy
, that givesShardConnectionBackoffPolicy
control over process of opening connections.This feature enables finer control over process of creating per shard connections, helping to prevent connections storms.
Fixes: #483
Solutions tested and rejected
Naive delay
Description
Policy would introduce a delay instead of executing connection creation request right away.
Policy would remember last time when connection creation was scheduled to and when it tries to schedule next request it would make sure that there is time between old and new request execution is equal or more than
delay
it is configured with.Results
It worked fine when cluster operates in a normal way.
However, during testing with artificial delays, it became clear that this approach breaks down when the time to establish a
connection exceeds the configured delay.
In such cases, connections begin to pile up - the greater the connection initialization time relative to the delay, the faster they accumulate.
This becomes especially problematic during connection storms.
As the cluster becomes overloaded and connection initialization slows down, the delay-based throttling loses its effectiveness. In other words, the more the cluster suffers, the less effective the policy becomes.
Solution
The solution was to give the policy direct control over the connection initialization process.
This allows the policy to track how many connections are currently pending and apply delays after connections are created, rather than before.
That change ensures the policy remains effective even under heavy load.
This behavior is exactly what has been implemented in this PR.
Pre-review checklist
./docs/source/
.Fixes:
annotations to PR description.