fix(serverless): use executionTimeoutMs as runsync client timeout by deanq · Pull Request #272 · runpod/flash

deanq · 2026-03-15T04:16:30Z

Summary

ServerlessResource.runsync() hardcoded a 60s HTTP client timeout regardless of executionTimeoutMs, causing GPU inference jobs exceeding 1 minute to always fail with a client-side timeout
Now uses executionTimeoutMs / 1000 as the client timeout when set (non-zero, positive), falling back to 60s default

Changes

File	Change
`core/resources/serverless.py`	Derive client timeout from `self.executionTimeoutMs` with explicit `is not None and > 0` guard
`tests/unit/resources/test_serverless.py`	3 tests: custom timeout, zero fallback, and None fallback

Test plan

test_runsync_uses_execution_timeout — verifies executionTimeoutMs=300000 results in timeout=300
test_runsync_default_timeout_when_zero — verifies executionTimeoutMs=0 falls back to timeout=60
test_runsync_default_timeout_when_none — verifies executionTimeoutMs=None falls back to timeout=60
Full make quality-check passes (85.40% coverage)

Closes AE-2375

Copilot

Pull request overview

This PR fixes ServerlessResource.runsync() always using a hardcoded 60s client timeout by deriving the HTTP client timeout from executionTimeoutMs (when configured), preventing long-running GPU inference jobs from failing client-side.

Changes:

Add a module-level default runsync timeout constant (60s).
Compute the runsync HTTP client timeout from executionTimeoutMs / 1000 when non-zero; otherwise fall back to the default.
Add unit tests covering configured timeout and default fallbacks.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`src/runpod_flash/core/resources/serverless.py`	Derives runsync client timeout from `executionTimeoutMs` with a 60s fallback constant.
`tests/unit/resources/test_serverless.py`	Adds async unit tests asserting the timeout passed to `rp_client.post()` for set/zero/None `executionTimeoutMs`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

src/runpod_flash/core/resources/serverless.py

tests/unit/resources/test_serverless.py

runpod-Henrik

1. The fix — correct

Clean implementation. The guard is not None and > 0 correctly rejects None, 0, and negative values, all falling back to the 60s default. No issues with the core logic.

2. Question: no buffer between server limit and client timeout

executionTimeoutMs=300_000 → client timeout of exactly 300.0s. The RunPod /runsync endpoint will kill the job at 300_000ms server-side, then return a response. If there's any network overhead between the server killing the job and the response arriving at the client, the client HTTP timeout fires first — same failure mode as before, just pushed out by however long the job runs.

A small buffer is conventional here:

timeout_s: float = (
    self.executionTimeoutMs / 1000 + 10   # +10s for network round-trip
    if self.executionTimeoutMs is not None and self.executionTimeoutMs > 0
    else DEFAULT_RUNSYNC_TIMEOUT_S
)

Not blocking — the fix is still a massive improvement over the hardcoded 60s — but worth a decision either way. If intentionally no buffer, a comment explaining why would close the question.

3. Test gap: negative `executionTimeoutMs` not covered

The guard rejects executionTimeoutMs < 0 (falls back to 60s), and commit edc425c explicitly called this out, but there's no test for it. A test would lock in that behavior:

async def test_runsync_default_timeout_when_negative(self):
    resource = ServerlessResource(name="test-neg", executionTimeoutMs=-1)
    resource.id = "ep-neg"
    # ... assert timeout=60

4. Scope note: `run()` polling path unaffected

This fix is specific to runsync. The async run() → polling path uses a different timeout model (RunPod SDK's await job.result() with its own polling interval). Not a regression here, just worth noting if AE-2375 was originally reported against both paths.

5. Known gap (pre-existing, not introduced here)

From the known-bugs register: execution_timeout_ms is not enforced server-side — workers ignore the configured timeout. This PR fixes the client-side mismatch only. If a job runs past executionTimeoutMs, the server doesn't kill it, so the client will now correctly wait the full configured duration. The server-side enforcement gap remains open separately.

Verdict

PASS. The fix resolves the stated bug correctly. One ask: decide on the buffer question (#2) and either add +10 or add a comment explaining the no-buffer choice.

🤖 Reviewed by Henrik's AI-Powered Bug Finder

runpod-Henrik

Follow-up on prior review

The explicit is not None and > 0 guard and DEFAULT_RUNSYNC_TIMEOUT_S = 60 constant are clean. The three test cases cover the main branches correctly.

The buffer question from the prior review is still open — executionTimeoutMs / 1000 gives exactly the server-side limit with no room for network round-trip. Still non-blocking, but worth noting.

Two minor gaps the tests don't cover:

Negative value (executionTimeoutMs=-100) — the > 0 guard correctly falls back to 60s but there's no test for it
Very small value (executionTimeoutMs=500) → timeout=0.5s — passes the guard and would cause an immediate client timeout on any real request; no validation prevents it

Verdict: PASS WITH NITS

Core fix is correct. The gaps above are low-risk in practice.

🤖 Reviewed by Henrik's AI-Powered Bug Finder

promptless · 2026-03-23T17:38:15Z

📝 Documentation updates detected!

New suggestion: Document runsync client timeout behavior

Tip: Leave inline comments with @Promptless on suggestion diffs in the Promptless dashboard for targeted refinements 💬

Copilot

Pull request overview

This PR fixes ServerlessResource.runsync() to respect executionTimeoutMs when determining the underlying HTTP client timeout, preventing long-running inference jobs from failing due to a hardcoded 60s client-side timeout.

Changes:

Compute runsync HTTP timeout from executionTimeoutMs / 1000 when executionTimeoutMs is set and positive; otherwise fall back to a 60s default.
Add/adjust unit tests to validate the timeout behavior for custom, zero, and None execution timeouts.
Remove a large block of unrelated unit tests from tests/unit/resources/test_serverless.py (not mentioned in the PR description).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`src/runpod_flash/core/resources/serverless.py`	Derives runsync client timeout from `executionTimeoutMs` with a 60s default constant.
`tests/unit/resources/test_serverless.py`	Adds runsync timeout tests, but also deletes substantial existing coverage for template env injection and `update()` behaviors.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-26T09:17:40Z

tests/unit/resources/test_serverless.py

+class TestServerlessRunsyncTimeout:
+    """Test that runsync uses executionTimeoutMs as client timeout."""



This diff removes a large set of existing unit tests (e.g. for _inject_template_env, _build_template_update_payload, and update() env preservation/injection behavior) and I can't find equivalent coverage elsewhere in tests/. Since the corresponding production code still exists in serverless.py, this looks like an unintended deletion that significantly reduces regression coverage. Please restore these tests (or move them to a new test module) and keep this PR focused on the runsync-timeout change, or explicitly document/justify the removal if it’s intentional.

runsync() hardcoded a 60s HTTP client timeout regardless of executionTimeoutMs, causing GPU inference jobs exceeding 1 minute to fail with a client-side timeout even when the server allowed more time. Now uses executionTimeoutMs / 1000 when set, falling back to 60s default. Closes AE-2375

- Move DEFAULT_RUNSYNC_TIMEOUT_S to module-level constant - Add test for executionTimeoutMs=None fallback

Replace truthiness check with `is not None and > 0` to reject negative values. Annotate timeout_s as float for clarity.

The update test needs get_template mocked to return a real dict, otherwise the AsyncMock auto-attribute returns a coroutine from .get("env") which is not iterable.

…ayload These test classes were accidentally deleted during an earlier rebase. They cover template env injection, API key injection, and update() env handling — unrelated to the runsync timeout fix. Also refreshes uv.lock to resolve stale runpod-python git commit.

deanq requested a review from Copilot March 15, 2026 04:25

Copilot started reviewing on behalf of deanq March 15, 2026 04:25 View session

Copilot AI reviewed Mar 15, 2026

View reviewed changes

src/runpod_flash/core/resources/serverless.py Outdated Show resolved Hide resolved

tests/unit/resources/test_serverless.py Show resolved Hide resolved

deanq requested a review from runpod-Henrik March 15, 2026 04:32

runpod-Henrik reviewed Mar 16, 2026

View reviewed changes

deanq force-pushed the fix/AE-2375-runsync-client-timeout branch from edc425c to a3d5e7d Compare March 17, 2026 20:00

runpod-Henrik reviewed Mar 18, 2026

View reviewed changes

deanq force-pushed the fix/AE-2375-runsync-client-timeout branch from a3d5e7d to 6a22440 Compare March 19, 2026 18:04

promptless bot mentioned this pull request Mar 23, 2026

docs: Document runsync client timeout behavior runpod/docs#585

Open

deanq force-pushed the fix/AE-2375-runsync-client-timeout branch 2 times, most recently from 38be4fa to ccfde45 Compare March 25, 2026 23:06

deanq requested review from Copilot and runpod-Henrik March 26, 2026 09:13

Copilot started reviewing on behalf of deanq March 26, 2026 09:14 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

jhcipar approved these changes Mar 31, 2026

View reviewed changes

deanq added 5 commits April 1, 2026 09:27

fix(serverless): address review feedback on runsync timeout

8d6c1d8

- Move DEFAULT_RUNSYNC_TIMEOUT_S to module-level constant - Add test for executionTimeoutMs=None fallback

fix(serverless): use explicit guard for executionTimeoutMs

c71ebf8

Replace truthiness check with `is not None and > 0` to reject negative values. Annotate timeout_s as float for clarity.

fix(test): restore get_template mock removed in prior commit

a021db0

The update test needs get_template mocked to return a real dict, otherwise the AsyncMock auto-attribute returns a coroutine from .get("env") which is not iterable.

deanq force-pushed the fix/AE-2375-runsync-client-timeout branch from b3463d2 to 058ca36 Compare April 1, 2026 18:49

deanq merged commit 49451d4 into main Apr 1, 2026
4 checks passed

deanq deleted the fix/AE-2375-runsync-client-timeout branch April 1, 2026 18:56

runpod-release-please-bot bot mentioned this pull request Apr 1, 2026

chore: release 1.11.3 #299

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(serverless): use executionTimeoutMs as runsync client timeout#272

fix(serverless): use executionTimeoutMs as runsync client timeout#272
deanq merged 5 commits intomainfrom
fix/AE-2375-runsync-client-timeout

deanq commented Mar 15, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

runpod-Henrik left a comment

Uh oh!

runpod-Henrik left a comment

Uh oh!

promptless bot commented Mar 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		class TestServerlessRunsyncTimeout:
		"""Test that runsync uses executionTimeoutMs as client timeout."""

Conversation

deanq commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

runpod-Henrik left a comment

Choose a reason for hiding this comment

1. The fix — correct

2. Question: no buffer between server limit and client timeout

3. Test gap: negative executionTimeoutMs not covered

4. Scope note: run() polling path unaffected

5. Known gap (pre-existing, not introduced here)

Verdict

Uh oh!

runpod-Henrik left a comment

Choose a reason for hiding this comment

Follow-up on prior review

Uh oh!

promptless bot commented Mar 23, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

deanq commented Mar 15, 2026 •

edited

Loading

3. Test gap: negative `executionTimeoutMs` not covered

4. Scope note: `run()` polling path unaffected