-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missed pairing opportunities on a weekly basis #55
Comments
My thoughts on "how to investigate", having done no actual investigation yet:
A little investigation: in http://console.cloud.google.com/logs/query,
makes a good starting point for finding the |
A red herring path I went down: timezones. The crontab says to perform matches at 4:00. Docs say if unspecified, the timezone is UTC. In the winter, 4:00 UTC is 11:00 EST (EST is UTC - 5:00). In the summer, 4:00 UTC is 00:00 EDT (EDT is UTC - 4:00). So in the winter the pairing bot runs "the day before", computing "tomorrow"'s matches; in the summer, it runs "the same day". But ListPairingTomorrow does the right thing: it uses GAE's time zone, UTC, and picks up the same day. That's correct regardless of what Eastern time is doing. |
OK, I see something odd in the logs, at least. A lot of the pairings generated wind up with a
"Invalid email" is one of the messages, for several e-mail addresses. There's also one report of It looks like this error isn't showing up in the match function; the prefix "Error when trying to send..." isn't showing up in the logs.
At least for today, I think this is a sufficient explanation for why Ian didn't get a pairing. The bot sent a message to multiple recipients, but one of those recipients was invalid; the message as a whole didn't send. (I haven't check to see if this was also the problem on other days.) I suspect this could be responsible for a few different dropped pairings. Some changes, maybe in priority order:
|
Per [issue 55], Zulip will occasionally refuse a message. We weren't passing this further up the stack, or even logging the status code. Log the status code along with the body, and return an error up the stack, if the Zulip request gets a user error (4xx) or server error (5xx). [issue 55]: #55
Zulip documentation confirms that errors will return 4xx/5xx errors, as well as error information in the body. Neither the "Invalid e-mail" nor the "...is no longer using Zulip" messages have the relevant e-mail as structured data. The API docs have a "tip" saying to start a conversation on the Zulip Zulip instance as to whether there are specific key/value pairs that would be relevant.
|
@stillgreenmoss spotted the underlying issue:
So a "real fix" here:
|
Per [issue 55], Zulip will occasionally refuse a message. We weren't passing this further up the stack, or even logging the status code. Log the status code along with the body, and return an error up the stack, if the Zulip request gets a user error (4xx) or server error (5xx). [issue 55]: #55
Current status: I think testbot is using Zulip IDs instead of e-mail for all identity purposes. Including talking to the RC API:
The one remaining use of e-mail is for review identity. I think that's OK to leave for now; I'll file a follow-up "enhancement". |
OK! #62 is in and deployed to prod Pairing Bot. I tagged the original commits in issue55 if anyone wants to see the history in more detail. I'm going to leave this open until we've seen all the crons run and take action:
which we'll see trickle in over the next couple weeks. |
Potentially related: #70 In cases where someone had chosen to skip their next pairing (by sending a So far, I've seen daily matching, checkin, and end-of-batch succeed (after some fixes to each), so I'll check those boxes now. |
Here to confirm the last cron job run! The welcome message was sent out a few minutes ago. Logs (maintainers-only link), Zulip message I haven't seen any more reports of pairing issues in the last week, so I think the Zulip ID and unskip fixes were all we needed. That said, I wasn't watching the logs particularly closely, so (anyone) please feel free to open a new issue if this happens again! |
@iafisher reports:
A
status
message from Ian shows T/W/R/F pairing schedule.@stillgreenmoss notes that we may need to instrument for more information (i.e. log more) to effectively debug this.
The text was updated successfully, but these errors were encountered: