-
Notifications
You must be signed in to change notification settings - Fork 1.1k
store: Do not use a fdw connection to check active_copies #5938
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Copying checks the active_copies table through the primary_public.active_copies foreign table to determine whether the copy has been cancelled and it should stop. With a large number of copies running, that causes a large number of postgres_fdw connections into the primary, which can overwhelm the primary. Instead, we now pass the connection pool for the primary into the copy code so that it can do this check without involving postgres_fdw.
zorancv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
| } | ||
|
|
||
| fn unlock(&mut self, dst: &Site) -> Result<(), StoreError> { | ||
| if !self.has_lock { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess here and in the two previous ifs if would be nice to get an error logged. Also in the else branch in switch().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think only the ifs in lock and unlock need logging; the else in switch is not an issue; we basically just want to make sure we put the right connection back into other, not just any. Changing those two ifs
store/postgres/src/copy.rs
Outdated
| } | ||
|
|
||
| /// Put `self` into `other` if `self` has the lock. | ||
| fn switch(self, other: &mut Option<Self>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't extract more suitable name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was definitely struggling with the name. But I like extract better.
Otherwise, the lock will linger and can block an attempt to restart a copy that failed for transient reasons
Copying checks the active_copies table through the primary_public.active_copies foreign table to determine whether the copy has been cancelled and it should stop.
With a large number of copies running, that causes a large number of postgres_fdw connections into the primary, which can overwhelm the primary.
Instead, we now pass the connection pool for the primary into the copy code so that it can do this check without involving postgres_fdw.