Skip to content

[Test] Quarantine AD stacks on failure for debugging purposes. #6742

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: release-3.13
Choose a base branch
from

Conversation

gmarciani
Copy link
Contributor

Description of changes

Quarantine AD stacks on failure for debugging purposes.
At most 5 will be quarantined to limit costs.

Tests

test_ad_integration succeeds
AD stack is retained on failure (injected artificial failures in Ad stack to trigger the retention)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@gmarciani gmarciani added skip-changelog-update Disables the check that enforces changelog updates in PRs 3.x Test labels Mar 17, 2025
@gmarciani gmarciani force-pushed the wip/mgiacomo/3130/wip/mgiacomo/3130/fix-ad-0317-1 branch 3 times, most recently from 0dc1070 to f41efd2 Compare March 18, 2025 03:25
@gmarciani gmarciani force-pushed the wip/mgiacomo/3130/wip/mgiacomo/3130/fix-ad-0317-1 branch 2 times, most recently from ab22bf3 to 3051466 Compare March 18, 2025 19:11
@@ -28,6 +28,7 @@
from jinja2.sandbox import SandboxedEnvironment
from retrying import retry
from time_utils import minutes, seconds
from constants import QUARANTINE_TAG_KEY, DO_NOT_DELETE_TAG_KEY, QUARANTINE_TAGS

Check notice

Code scanning / CodeQL

Unused import Note test

Import of 'DO_NOT_DELETE_TAG_KEY' is not used.
@gmarciani gmarciani force-pushed the wip/mgiacomo/3130/wip/mgiacomo/3130/fix-ad-0317-1 branch 3 times, most recently from 63a7e22 to a088a2e Compare March 19, 2025 17:55
In particular:
1. Make the Ad admin node signal the failure, not only the success; in this way the wait condition handle can fail faster.
2. reduced the number of retries made by adcli from 5 to 3 because in case of issues is not necessary to do that many retries; especially considering that adcli has a 2min retry delay.
3. reduced the condition handle timeout from 900s to 600s as 10min are enough to include AD admin node bootstrap and 3 adcli retries.
4. Execute the post processing lambda only if the AD admin node was able to setup the directory
@gmarciani gmarciani force-pushed the wip/mgiacomo/3130/wip/mgiacomo/3130/fix-ad-0317-1 branch from a088a2e to 42ec878 Compare March 19, 2025 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.x skip-changelog-update Disables the check that enforces changelog updates in PRs Test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant