Skip to content

Cozystack doesn't recover from system reboot. #1730

@marekrogosz

Description

@marekrogosz

Describe the bug
I'm trying to setup a 3 bare metal node cluster. I started with one node and plan to add two more once the first one is ready.

However I noticed rebooting the machine (either gracefully or via power cycle) results in Cozystack not booting again. I need my homelab to recover from power interruptions.

Environment

  • Cozystack version: 0.38.4
  • Provider: on-prem

To Reproduce
Steps to reproduce the behavior:

  1. Deploy Cozystack on a single node as per documentation.
  2. Observe all tenant-root pods are in Ready state.
  3. Reboot the machine (either gracefully or via power cycle).
  4. Wait for machine to boot up again.

Expected behavour
Cozystack should come back online and cluster should be available.

Actual behaviour
Talos comes back online but Cozystack dashboard is not available.

Logs
I noticed etcd pods are missing after reboot:

❯ kubectl get pod -n tenant-root
NAME                                          READY   STATUS      RESTARTS   AGE
alerta-c98d86f94-sb8zx                        0/1     Completed   0          57m
alerta-c98d86f94-vwgbs                        0/1     Completed   0          66m
alerta-db-1                                   0/1     Completed   0          57m
alerta-db-2                                   0/1     Completed   0          56m
cm-acme-http-solver-lzb9k                     0/1     Error       0          66m
cm-acme-http-solver-zdvkt                     0/1     Error       0          66m
grafana-db-1                                  0/1     Completed   0          57m
grafana-db-2                                  0/1     Completed   0          56m
grafana-deployment-768b84ffcd-48kvr           0/1     Completed   5          66m
grafana-deployment-768b84ffcd-6j2sk           0/1     Completed   3          57m
grafana-deployment-768b84ffcd-6l64l           0/1     Completed   4          66m
grafana-deployment-768b84ffcd-jgrvx           0/1     Completed   4          57m
root-ingress-controller-75c59d8c84-stsvx      0/2     Completed   2          69m
root-ingress-controller-75c59d8c84-vjzgp      0/2     Completed   2          69m
root-ingress-defaultbackend-cd98c755b-56pfl   0/1     Completed   0          69m
root-ingress-defaultbackend-cd98c755b-kfppz   0/1     Completed   0          57m
vlogs-generic-5f54c7f9d4-ngmlf                0/1     Completed   0          66m
vmalert-vmalert-shortterm-5c58dd9f5b-fmhnq    0/2     Completed   0          66m
vmalert-vmalert-shortterm-5c58dd9f5b-g54tg    0/2     Completed   0          57m
vminsert-longterm-6b4565b447-5cpgv            0/1     Completed   1          63m
vminsert-longterm-6b4565b447-8b4s8            0/1     Completed   0          57m
vminsert-longterm-6b4565b447-cxxrn            0/1     Completed   0          62m
vminsert-shortterm-5fc4d4b977-6p5kq           0/1     Completed   0          63m
vminsert-shortterm-5fc4d4b977-ptkgp           0/1     Completed   0          62m
vminsert-shortterm-5fc4d4b977-rxpsh           0/1     Completed   0          57m
vminsert-shortterm-5fc4d4b977-x282x           0/1     Completed   0          57m
vmstorage-longterm-1                          0/1     Error       1          66m

Screenshots
If applicable, add screenshots to help explain the problem.

Additional context
I'm new to Cozystack and Kubernetes so forgive me my lack of knowledge.

I also tried to follow Talos Disaster Recovery steps but that didn't help.

Some more questions:

  1. Is the issue caused by my (temporary) single node setup?
  2. Do I need to follow Cozystack Backup Restore steps after each reboot?

Checklist

  • I have checked the documentation
  • I have searched for similar issues
  • I have included all required information
  • I have provided clear steps to reproduce
  • I have included relevant logs

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions