-
Notifications
You must be signed in to change notification settings - Fork 1k
store: Do not use a fdw connection to check active_copies #5938
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Copying checks the active_copies table through the primary_public.active_copies foreign table to determine whether the copy has been cancelled and it should stop. With a large number of copies running, that causes a large number of postgres_fdw connections into the primary, which can overwhelm the primary. Instead, we now pass the connection pool for the primary into the copy code so that it can do this check without involving postgres_fdw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
} | ||
|
||
fn unlock(&mut self, dst: &Site) -> Result<(), StoreError> { | ||
if !self.has_lock { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess here and in the two previous if
s if would be nice to get an error logged. Also in the else branch in switch()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think only the ifs in lock
and unlock
need logging; the else
in switch
is not an issue; we basically just want to make sure we put the right connection back into other
, not just any. Changing those two ifs
store/postgres/src/copy.rs
Outdated
} | ||
|
||
/// Put `self` into `other` if `self` has the lock. | ||
fn switch(self, other: &mut Option<Self>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't extract
more suitable name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was definitely struggling with the name. But I like extract
better.
Otherwise, the lock will linger and can block an attempt to restart a copy that failed for transient reasons
Copying checks the active_copies table through the primary_public.active_copies foreign table to determine whether the copy has been cancelled and it should stop.
With a large number of copies running, that causes a large number of postgres_fdw connections into the primary, which can overwhelm the primary.
Instead, we now pass the connection pool for the primary into the copy code so that it can do this check without involving postgres_fdw.