-
Notifications
You must be signed in to change notification settings - Fork 134
Description
Describe the bug
I'm trying to setup a 3 bare metal node cluster. I started with one node and plan to add two more once the first one is ready.
However I noticed rebooting the machine (either gracefully or via power cycle) results in Cozystack not booting again. I need my homelab to recover from power interruptions.
Environment
- Cozystack version: 0.38.4
- Provider: on-prem
To Reproduce
Steps to reproduce the behavior:
- Deploy Cozystack on a single node as per documentation.
- Observe all tenant-root pods are in Ready state.
- Reboot the machine (either gracefully or via power cycle).
- Wait for machine to boot up again.
Expected behavour
Cozystack should come back online and cluster should be available.
Actual behaviour
Talos comes back online but Cozystack dashboard is not available.
Logs
I noticed etcd pods are missing after reboot:
❯ kubectl get pod -n tenant-root
NAME READY STATUS RESTARTS AGE
alerta-c98d86f94-sb8zx 0/1 Completed 0 57m
alerta-c98d86f94-vwgbs 0/1 Completed 0 66m
alerta-db-1 0/1 Completed 0 57m
alerta-db-2 0/1 Completed 0 56m
cm-acme-http-solver-lzb9k 0/1 Error 0 66m
cm-acme-http-solver-zdvkt 0/1 Error 0 66m
grafana-db-1 0/1 Completed 0 57m
grafana-db-2 0/1 Completed 0 56m
grafana-deployment-768b84ffcd-48kvr 0/1 Completed 5 66m
grafana-deployment-768b84ffcd-6j2sk 0/1 Completed 3 57m
grafana-deployment-768b84ffcd-6l64l 0/1 Completed 4 66m
grafana-deployment-768b84ffcd-jgrvx 0/1 Completed 4 57m
root-ingress-controller-75c59d8c84-stsvx 0/2 Completed 2 69m
root-ingress-controller-75c59d8c84-vjzgp 0/2 Completed 2 69m
root-ingress-defaultbackend-cd98c755b-56pfl 0/1 Completed 0 69m
root-ingress-defaultbackend-cd98c755b-kfppz 0/1 Completed 0 57m
vlogs-generic-5f54c7f9d4-ngmlf 0/1 Completed 0 66m
vmalert-vmalert-shortterm-5c58dd9f5b-fmhnq 0/2 Completed 0 66m
vmalert-vmalert-shortterm-5c58dd9f5b-g54tg 0/2 Completed 0 57m
vminsert-longterm-6b4565b447-5cpgv 0/1 Completed 1 63m
vminsert-longterm-6b4565b447-8b4s8 0/1 Completed 0 57m
vminsert-longterm-6b4565b447-cxxrn 0/1 Completed 0 62m
vminsert-shortterm-5fc4d4b977-6p5kq 0/1 Completed 0 63m
vminsert-shortterm-5fc4d4b977-ptkgp 0/1 Completed 0 62m
vminsert-shortterm-5fc4d4b977-rxpsh 0/1 Completed 0 57m
vminsert-shortterm-5fc4d4b977-x282x 0/1 Completed 0 57m
vmstorage-longterm-1 0/1 Error 1 66m
Screenshots
If applicable, add screenshots to help explain the problem.
Additional context
I'm new to Cozystack and Kubernetes so forgive me my lack of knowledge.
I also tried to follow Talos Disaster Recovery steps but that didn't help.
Some more questions:
- Is the issue caused by my (temporary) single node setup?
- Do I need to follow Cozystack Backup Restore steps after each reboot?
Checklist
- I have checked the documentation
- I have searched for similar issues
- I have included all required information
- I have provided clear steps to reproduce
- I have included relevant logs