You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We apparently had an issue with the name service in the Kubernetes cluster for Telescope. The name service returned stale IP addresses for settings.prod.mozaws.net inside the Telescope pod, which somehow caused the whole service to stall and not respond to requests anymore. We could fix it after a lot of debugging by getting the name service fixed, but Telescope shouldn't lock up just because it times out on a URL. It kind of looks like it was doing blocking calls to establish connections, eventually stalling the event loop because all threads were blocking.
I obtain consistent timeout exceptions, which are retried via backoff.
Next step would consist in identifying whether the event loop is actually blocked during the connection or not. Or whether the issue comes from the only thread executor being overloaded when many calls are timeout'ing
From Slack
The text was updated successfully, but these errors were encountered: