-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Click refresh when minion is down #9865
Click refresh when minion is down #9865
Conversation
👋 Hello! Thanks for contributing to our project. You can see the progress at the end of this page and at https://github.com/uyuni-project/uyuni/pull/9865/checks If you are unsure the failing tests are related to your code, you can check the "reference jobs". These are jobs that run on a scheduled time with code from master. If they fail for the same reason as your build, it means the tests or the infrastructure are broken. If they do not fail, but yours do, it means it is related to your code. Reference tests: KNOWN ISSUES Sometimes the build can fail when pulling new jar files from download.opensuse.org . This is a known limitation. Given this happens rarely, when it does, all you need to do is rerun the test. Sorry for the inconvenience. For more tips on troubleshooting, see the troubleshooting guide. Happy hacking! |
8c5287e
to
5c98807
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this approach because this could hide issues.
That's why we try to don't use re-tries anywhere, so we enforce reliability and a deterministic behavor on the product side.
|
||
if has_content?('Minion is down or could not be contacted.', wait: 3) | ||
find(:xpath, "//input[@value='Reschedule']").click | ||
step %(I wait 30 seconds until the event is picked up and #{timeout} seconds until the event "#{event}" is completed) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if has_content?('Minion is down or could not be contacted.', wait: 3) | |
find(:xpath, "//input[@value='Reschedule']").click | |
step %(I wait 30 seconds until the event is picked up and #{timeout} seconds until the event "#{event}" is completed) | |
end | |
minion_down_str = 'Minion is down or could not be contacted.' | |
raise SystemCallError, minion_down_str if has_content?(minion_down_str, wait: 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This "Minion is down" is most probably a product bug that should be fixed.
The solution is not to hide the problem in the test suite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also not much fan of this approach, as it should not be expected that the minion is down and the action is not scheduled. Even if this change is improving CI stability, it might hide potential issues.
We should probably understand why in some cases the minions are down in our CI environment.
In the past, this helped us to detect some actual problems in the product.
For instance, in the past we detected that minion startup can be stuck for a bunch of seconds, making the minion unresponsive during that time, while calculating the FQDNS grains in environments with a wrong DNS setup. If an action is scheduled within this time, it fails with "'Minion is down or could not be contacted" message.
We should probably debug why in some cases we get this now. It could be that the minion startup is taking long that expected, or that CI is scheduling actions too fast after a minion is restarted by some previous action (timing issue).
I would try to go in this direction.
If we don't have a card, we might need want for this issue, and a bug report if it differs from these others:
|
What does this PR change?
Fixes an issue where the Salt service is sometimes down on the minion, causing failures when checking the event. In most cases, refreshing the event resolves the problem.
This PR introduces a default behavior to prevent flaky tests. If the "Minion down" message appears during the event check, the process will automatically click the refresh button and restart the event check. Since these actions are performed within repeat_until_timeout, there is no risk of an infinite loop (verified through testing).
GUI diff
No difference.
Documentation
No documentation needed: only internal and user invisible changes
DONE
Test coverage
ℹ️ If a major new functionality is added, it is strongly recommended that tests for the new functionality are added to the Cucumber test suite
No tests: already covered
DONE
Links
Port(s): # add downstream PR(s), if any
Changelogs
Make sure the changelogs entries you are adding are compliant with https://github.com/uyuni-project/uyuni/wiki/Contributing#changelogs and https://github.com/uyuni-project/uyuni/wiki/Contributing#uyuni-projectuyuni-repository
If you don't need a changelog check, please mark this checkbox:
If you uncheck the checkbox after the PR is created, you will need to re-run
changelog_test
(see below)Re-run a test
If you need to re-run a test, please mark the related checkbox, it will be unchecked automatically once it has re-run:
Before you merge
Check How to branch and merge properly!