You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
AIS version is whatever is installed by default with MCE 2.7.2
OS image is 4.17.2
release image is 4.17.13
Note: all the listed issues were present in 2.6.X / 4.16.X as well.
Pre-requisites
I create all the required CRs mentioned in the documentation, boot the full discovery image myself (because baremetal operator Ironic also doesn't work but that's a whole other can of worms)
Node boots up, Agent CR is created in my cluster, so far so good. Then...
Problems 1. "The host name localhost.localdomain is forbidden"
Installation first fails on this Agent validation, baseline Ignition config embedded by the agent image service doesn't seem to set hostname yet there is a mandatory validation for it in the agent.
Who is supposed to set this? Am I missing some configuration?
2. Hostname validation rule is not periodically re-checked
Well, I guess I will set hostname myself for now to be able to proceed.
However, unlike NTP synchronization related problems -which do eventually resolve on their own- the hostname validation failure never goes away once triggered.
I have waited for an hour. Then I approved the Agent. Then I even tried to manually delete the related status fields from the CR, all to no avail.
3. Bootkube service doesn't seem to inherit HTTP/S proxy settings from InfraEnv
Well, I guess I will "sudo sysctemctl restart agent" through SSH for now to be able to proceed.
Installation progresses to "waiting for bootkube" state, however bootstrapping fails with a timeout. Upon closer inspection it turns out the release images exclusively used by the bootkube service cannot be pulled because the service could not reach quay.io registry. Which is kinda strange considering the agent and the release image service previously both could access the internet.
Upon closely inspecting the bootkube service it seems the HTTP_PROXY, HTTPS_PROXY, and NO_PROXY settings are not inherited by this unit from the InfraEnv CR (neither are these set into Podman's proxy setting). For reference the agent service explicitly includes these settings in its unit file.
Who is responsible for making sure bootkube service is operational behind an HTTP proxy, and where should I put these proxy settings in addition to the InfraEnv CR?
4. Bootkube bootstrapping script has some interesting logic for SNO
Well, I guess I will copy the proxy settings from the agent service into bootkube service file for now to be able to proceed.
Images are successfully getting pulled, boostrapping manifests are getting generated, I'm goaded into believing this is it.
But alas it was not meant to be: the service eventually times out waiting for etcd to come up. Specifically, this part seems to be failing:
# in case of single node, if we removed etcd, there is no point to wait for it on restart
if [ ! -f stop-etcd.done ]
then
record_service_stage_start "wait-for-etcd"
# Wait for the etcd cluster to come up.
wait_for_etcd_cluster
record_service_stage_success
fi
I think I understand the intention however this is getting executed during the initial run as well. Right after template generation, right before the following block:
if [ "$BOOTSTRAP_INPLACE" = true ]
then
REQUIRED_PODS=""
fi
echo "Starting cluster-bootstrap..."
run_cluster_bootstrap() {
record_service_stage_start "cb-bootstrap"
bootkube_podman_run \
So obviously the etcd health check will perpetually fail, simply because there is no etcd running.
If I provision the done file cluster bootstrap also fails because REQUIRED_PODS is empty, so nothing comes up, 20 minutes timeout waiting for the API server fires and service is restarted
This is where I ultimately gave up. So again, appreciate any suggestions as to what am I missing, because verbatim executing the procedure described in the documentation doesn't seem to produce a working environment on a bare metal SNO.
The text was updated successfully, but these errors were encountered:
I'm following guides https://github.com/openshift/assisted-service/blob/master/docs/hive-integration/kube-api-getting-started.md and https://docs.redhat.com/en/documentation/red_hat_advanced_cluster_management_for_kubernetes/2.12/html/clusters/cluster_mce_overview#create-intro, but I've encountered so many different problems I'm not sure if SNO installation is even supposed to work.
I must be doing something fundamentally wrong so appreciate any inputs here!
Context:
Note: all the listed issues were present in 2.6.X / 4.16.X as well.
Pre-requisites
I create all the required CRs mentioned in the documentation, boot the full discovery image myself (because baremetal operator Ironic also doesn't work but that's a whole other can of worms)
Node boots up, Agent CR is created in my cluster, so far so good. Then...
Problems
1. "The host name localhost.localdomain is forbidden"
Installation first fails on this Agent validation, baseline Ignition config embedded by the agent image service doesn't seem to set hostname yet there is a mandatory validation for it in the agent.
Who is supposed to set this? Am I missing some configuration?
2. Hostname validation rule is not periodically re-checked
Well, I guess I will set hostname myself for now to be able to proceed.
However, unlike NTP synchronization related problems -which do eventually resolve on their own- the hostname validation failure never goes away once triggered.
I have waited for an hour. Then I approved the Agent. Then I even tried to manually delete the related status fields from the CR, all to no avail.
3. Bootkube service doesn't seem to inherit HTTP/S proxy settings from InfraEnv
Well, I guess I will "sudo sysctemctl restart agent" through SSH for now to be able to proceed.
Installation progresses to "waiting for bootkube" state, however bootstrapping fails with a timeout. Upon closer inspection it turns out the release images exclusively used by the bootkube service cannot be pulled because the service could not reach quay.io registry. Which is kinda strange considering the agent and the release image service previously both could access the internet.
Upon closely inspecting the bootkube service it seems the HTTP_PROXY, HTTPS_PROXY, and NO_PROXY settings are not inherited by this unit from the InfraEnv CR (neither are these set into Podman's proxy setting). For reference the agent service explicitly includes these settings in its unit file.
Who is responsible for making sure bootkube service is operational behind an HTTP proxy, and where should I put these proxy settings in addition to the InfraEnv CR?
4. Bootkube bootstrapping script has some interesting logic for SNO
Well, I guess I will copy the proxy settings from the agent service into bootkube service file for now to be able to proceed.
Images are successfully getting pulled, boostrapping manifests are getting generated, I'm goaded into believing this is it.
But alas it was not meant to be: the service eventually times out waiting for etcd to come up. Specifically, this part seems to be failing:
I think I understand the intention however this is getting executed during the initial run as well. Right after template generation, right before the following block:
So obviously the etcd health check will perpetually fail, simply because there is no etcd running.
If I provision the done file cluster bootstrap also fails because REQUIRED_PODS is empty, so nothing comes up, 20 minutes timeout waiting for the API server fires and service is restarted
This is where I ultimately gave up. So again, appreciate any suggestions as to what am I missing, because verbatim executing the procedure described in the documentation doesn't seem to produce a working environment on a bare metal SNO.
The text was updated successfully, but these errors were encountered: