-
Notifications
You must be signed in to change notification settings - Fork 440
WIP: Improve CAPZ Bootstrapping Extension's error message #5509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
WIP: Improve CAPZ Bootstrapping Extension's error message #5509
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #5509 +/- ##
=======================================
Coverage 52.57% 52.57%
=======================================
Files 272 272
Lines 29470 29470
=======================================
Hits 15495 15495
Misses 13167 13167
Partials 808 808 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
azure/defaults.go
Outdated
@@ -112,7 +112,7 @@ const ( | |||
|
|||
var ( | |||
// LinuxBootstrapExtensionCommand is the command the VM bootstrap extension will execute to verify Linux nodes bootstrap completes successfully. | |||
LinuxBootstrapExtensionCommand = fmt.Sprintf("for i in $(seq 1 %d); do test -f %s && break; if [ $i -eq %d ]; then exit 1; else sleep %d; fi; done", bootstrapExtensionRetries, bootstrapSentinelFile, bootstrapExtensionRetries, bootstrapExtensionSleep) | |||
LinuxBootstrapExtensionCommand = fmt.Sprintf("for i in $(seq 1 %d); do test -f %s && break; if [ $i -eq %d ]; then echo 'Error joining node to cluster: kubeadm init failed. To debug, check the cloud-init, kubelet, or other bootstrap logs: https://capz.sigs.k8s.io/self-managed/troubleshooting.html?highlight=kubeadmcontrolplane#checking-cloud-init-logs-ubuntu.'; exit 1; else sleep %d; fi; done", bootstrapExtensionRetries, bootstrapSentinelFile, bootstrapExtensionRetries, bootstrapExtensionSleep) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might want to say "kubeadm init or join", as this error could occur on either the first control plane node (init) or any other node that joins the cluster afterwards
9cede97
to
a6117b3
Compare
/retest |
@@ -112,7 +112,7 @@ const ( | |||
|
|||
var ( | |||
// LinuxBootstrapExtensionCommand is the command the VM bootstrap extension will execute to verify Linux nodes bootstrap completes successfully. | |||
LinuxBootstrapExtensionCommand = fmt.Sprintf("for i in $(seq 1 %d); do test -f %s && break; if [ $i -eq %d ]; then exit 1; else sleep %d; fi; done", bootstrapExtensionRetries, bootstrapSentinelFile, bootstrapExtensionRetries, bootstrapExtensionSleep) | |||
LinuxBootstrapExtensionCommand = fmt.Sprintf("for i in $(seq 1 %d); do test -f %s && break; if [ $i -eq %d ]; then echo 'Error joining node to cluster: kubeadm init or join failed. To debug, check the cloud-init, kubelet, or other bootstrap logs: https://capz.sigs.k8s.io/self-managed/troubleshooting.html?highlight=kubeadmcontrolplane#checking-cloud-init-logs-ubuntu.'; exit 1; else sleep %d; fi; done", bootstrapExtensionRetries, bootstrapSentinelFile, bootstrapExtensionRetries, bootstrapExtensionSleep) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not so sure of the helpfulness of this message.
- Unless the verbosity of kubeadm init/join is increased by
--v=<any number greater than 5>
users do not have a way to debug the issue. - Where else do we expect to see this message ? Is it available on the AzureMachine.Status ?
Suggestions:
- Should we be renaming
#checking-cloud-init-logs-ubuntu
tochecking-cloud-init-logs-linux
since Linux is more generic ? - Can we paraphrase the error message from
Error joining node to cluster: kubeadm init or join failed......
toError: kubeadm init or join failed. Refer: https://capz.sigs.k8s.io/self-managed/troubleshooting.html?highlight=kubeadmcontrolplane#checking-cloud-init-logs-ubuntu for more guidance on debugging this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great suggestions! Gonna work on modifying it now...
Unless the verbosity of kubeadm init/join is increased by --v=<any number greater than 5> users do not have a way to debug the issue.
Is that true? If so, is there a way to change that from CAPZ? I've debugged bootstrap failures previously with these methods.
Where else do we expect to see this message ? Is it available on the AzureMachine.Status ?
I think so but I'm not 100% sure. It should show up in the CAPZ controller logs for sure
As per discussion with @jackfrancis, I'll work on potentially changing the script to search the cloud init logs automatically for any kind of error, and displaying that to the user. |
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
This PR adds a more descriptive error message for the CAPZ Bootstrapping Extension so that users can better understand what went wrong when the extension fails to install.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #
Special notes for your reviewer:
TODOs:
Release note: