-
Notifications
You must be signed in to change notification settings - Fork 132
Add additional flavor in leadership when transaction fails to start #1101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d58c9e2 to
af7d6e4
Compare
bgentry
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally ok with this but wondering if it's actually a broader issue. Good to proceed either way as you see fit.
CHANGELOG.md
Outdated
|
|
||
| ### Added | ||
|
|
||
| - Add a little more error flavor for when encountering a deadline exceeded error on leadership election suggesting that the user may want to try increasing their database size. [PR #1101](https://github.com/riverqueue/river/pull/1101). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - Add a little more error flavor for when encountering a deadline exceeded error on leadership election suggesting that the user may want to try increasing their database size. [PR #1101](https://github.com/riverqueue/river/pull/1101). | |
| - Add a little more error flavor for when encountering a deadline exceeded error on leadership election suggesting that the user may want to try increasing their database pool size. [PR #1101](https://github.com/riverqueue/river/pull/1101). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks.
internal/leadership/elector.go
Outdated
| if err != nil { | ||
| var additionalDetail string | ||
| if errors.Is(err, context.DeadlineExceeded) { | ||
| additionalDetail = " (a common cause of this a database pool that's at its connection limit; you may need to increase maximum connections)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| additionalDetail = " (a common cause of this a database pool that's at its connection limit; you may need to increase maximum connections)" | |
| additionalDetail = " (a common cause of this is a database pool that's at its connection limit; you may need to increase maximum connections)" |
Also, this brings to mind that we really need a doc section on this because it'd be amazing to link to that from here. Could even adopt a shortened permalink redirect convention like riverqueue.com/e/elector-pool-size to keep working links even as we reorganize docs.
I can start on this page later today unless you want to take that too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Yeah, +1 on a docs section. I think realistically I'm not going to start on that in the next week or so, so yeah, go for it if you have some time.
| execTx, err := exec.Begin(ctx) | ||
| if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent a moment wondering why we had to ditch dbutil.WithTxV before realizing that it's the Begin which fails and it's before our function here gets called so of course we can't customize the error. But then I wondered: is this extra error detail something we could/should just apply to all Begin calls within River which time out? Are there cases where this would be misleading or inaccurate? Because there definitely other places where the constrained pool size could manifest in the same way. Or do you think that'd be too noisy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think the main reason that the elector is the most important is that it's the most common place in River for any user to start a transaction:
They may be starting transactions or performing operations in their Work code more than anywhere else, but this isn't part of River.
Other than that, the elector will be running frequently, and they won't get transactions otherwise unless they're using InsertTx, JobGetTx, etc., or some other feature like scheduled jobs.
I think it might make sense to make this a broad error (although it wouldn't be just on transactions then, it'd be on every driver call), but IMO for now it makes sense to try and improve things incrementally with something like this. We've had a few reports empirically now that this is where people tend to run into the problem, so let's patch this up and see how things go from there.
af7d6e4 to
49c0c77
Compare
Related to #1077, a common problem that users run into is that there database pool is configured to be too small, and a common place they run into this is as a River client is trying to elect itself. Here, add a little bit of custom flavor to a deadline exceeded error that occurs during leadership election. This is certainly not an exhaustive way to reveal these errors (it only goes in one spot), but the idea is that we put it in a common error spot, and it should improve things incrementally.
49c0c77 to
93c3472
Compare
|
Thx Blake. Going to merge this one for now, but yeah, lets re-evaluate if it seems like it'd be useful to have elsewhere too. |
Related to #1077, a common problem that users run into is that there
database pool is configured to be too small, and a common place they run
into this is as a River client is trying to elect itself.
Here, add a little bit of custom flavor to a deadline exceeded error
that occurs during leadership election. This is certainly not an
exhaustive way to reveal these errors (it only goes in one spot), but
the idea is that we put it in a common error spot, and it should improve
things incrementally.