Skip to content

Conversation

@mkoura
Copy link
Collaborator

@mkoura mkoura commented Nov 25, 2025

Previously, the testrun could run indefinitely. These changes introduce a various timeouts to prevent hanging tests.

@mkoura mkoura requested a review from saratomaz as a code owner November 25, 2025 17:30
@mkoura mkoura force-pushed the timeout_get_cluster_one_hour branch 3 times, most recently from 9e0df20 to 30802e8 Compare November 26, 2025 16:33
@mkoura mkoura requested a review from Copilot November 26, 2025 18:13
Copilot finished reviewing on behalf of mkoura November 26, 2025 18:18

This comment was marked as outdated.

Introduce stricter checks for dead cluster instances as the test wait
deadline approaches. Add configurable grace period, strict check window,
and dead fraction threshold. Replace all-dead check with a fractional
dead check to fail earlier when too many clusters are unavailable.
Added pytest-timeout v2.4.0 to dependencies to enable automatic abortion
of hanging tests. Updated poetry.lock and pyproject.toml accordingly.
This improves test reliability by preventing indefinite test runs.
Specify --timeout and --session-timeout options for target_tests,
target_testpr, and target_testnets in .github/run_tests.sh. This ensures
test runs have appropriate limits for duration and session, improving
reliability and preventing hangs in CI workflows.
@mkoura mkoura force-pushed the timeout_get_cluster_one_hour branch from da83034 to 4a86013 Compare November 26, 2025 18:44
@mkoura mkoura changed the title fix(cluster): add 1-hour timeout for cluster acquisition fix(testrun): add timeouts for testrun Nov 26, 2025
Introduce SESSION_TIMEOUT environment variable to set an overall timeout
for the test session in .github/run_tests.sh. The test runner now uses
the timeout command to enforce this limit. Default values are set for
different targets. This helps prevent excessively long test runs and
improves CI reliability.
Add a background system resource monitor to regression.sh that logs CPU, memory,
and disk usage every 10 minutes to monitor.log. Ensure monitor is stopped on
script exit. Include monitor.log as a workflow artifact for analysis.
@mkoura mkoura force-pushed the timeout_get_cluster_one_hour branch from 4a86013 to 7b94d09 Compare November 27, 2025 17:31
@mkoura mkoura requested a review from Copilot November 27, 2025 18:12
Copilot finished reviewing on behalf of mkoura November 27, 2025 18:14
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 6 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

@mkoura mkoura merged commit 5479fc8 into master Nov 27, 2025
10 checks passed
@mkoura mkoura deleted the timeout_get_cluster_one_hour branch November 27, 2025 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants