time out long running requests more aggressively #10833

chris48s · 2025-01-23T21:43:40Z

We set a hard limit on how long we'll attempt to serve a request for before we give up. At the moment this is set to 20 seconds in production. I think we should make this timeout shorter. In this PR, I propose: If we've held a connection open for 8 seconds trying to serve a request and we don't have a badge yet, its time to serve a 408 and move on.

github-actions · 2025-01-23T21:44:39Z

	Messages
📖	✨ Thanks for your contribution to Shields, @chris48s!

Generated by 🚫 dangerJS against 7b367de

jNullj · 2025-01-24T21:50:37Z

I think 20 is too much, but is there a particular reason we change this from 20 to 8?

chris48s · 2025-01-25T15:21:41Z

The most common place where shields badges are viewed is on GitHub. All images on GitHub are served via an instance of camo (GitHub's image proxy). Camo will only wait 4 seconds for a response from an upstream (like shields.io) before returning an error, so any badge that takes more than 4 seconds to return a response won't display for most users.

We should set our hard limit a bit longer than that. Partly because shields badges are sometimes viewed in other contexts. Partly because sometimes there's value in letting a slightly long-running request run to completion so future requests can be served from CloudFlare.

In normal circumstances, we aim for all our badges to render in under 4 seconds and the vast majority do.

One of the failure modes I am trying to prevent here is where a service that represents a large chunk of our traffic (e.g: NPM, PyPI) has a performance problem and we end up with load of open connections tied up waiting on request to their API which causes a full service outage for us (this has actually happened before).

8 is a bit of an arbitrary choice. I'd be happy setting this to any number greater than 5 and less than or equal to 10 as a next step and see how it goes.

time out long running requests more aggressively

7b367de

chris48s added the operations Hosting, monitoring, and reliability for the production badge servers label Jan 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

time out long running requests more aggressively #10833

time out long running requests more aggressively #10833

chris48s commented Jan 23, 2025

github-actions bot commented Jan 23, 2025

jNullj commented Jan 24, 2025

chris48s commented Jan 25, 2025

time out long running requests more aggressively #10833

Are you sure you want to change the base?

time out long running requests more aggressively #10833

Conversation

chris48s commented Jan 23, 2025

github-actions bot commented Jan 23, 2025

jNullj commented Jan 24, 2025

chris48s commented Jan 25, 2025