-
Notifications
You must be signed in to change notification settings - Fork 996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster broken with upgrading 1.11.0 -> 1.14.0 #2852
Comments
baznikin
changed the title
Issues with upgrading 1.11.0 -> 1.14.0
Cluster broken with upgrading 1.11.0 -> 1.14.0
Jan 23, 2025
Downgrading to 1.11.0 resolve my issues |
Update.
After killing failed pod it booted up OK:
logs:
After one more restart it hangs early:
So, eventually, after series of pod restarts, all cluster will be dead. Reverted to 1.13.0 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
First of all, sorry for long logs and unstructured message. To write clean issue you have to have at least some understanding of what happens, but I have no idea yet. I read release notes on 1.12, 1.13 and 1.14 and descide I can upgrade stright to 1.14.0. But...
After upgrading postgres-operator 1.11.0 to 1.14.0 my clusters won't startup:
3 clusters successfully started with updated spilo image (
payments-pg
,asana-automate-db
anddevelop-postgresql
) and 2 - not (brandadmin-pg
andgames-aggregator-pg
). Before I noticed not clusters are updated, I initialized upgrade 16 -> 17 on clusterdevelop-postgresql
and it stuck with same symptoms (at first I thought it is this reason, but now I don't thinks so, see below):and no more logs.
Some clusters managed to start there is same error:
After I delete this pod it stuck too!
Processes inside of failed clusters:
After one more deletion it is managed to start.
I notice one thing in the logs - sometimes container starts with WAL-E variables, sometimes - not. Operator shows its status as OK, but it's not:
While I wrote this issue passed like an hour or so, in despair I restarted this failed pod one more time and it STARTED (container
postgres
becameReady
), but still not working:All my clusters consisting of two nodes can't start replica node: Probably problem is with WAL variables...
It's complete mess!
Operator installed with Helm and terraform. Configured with ConfigMap:
The text was updated successfully, but these errors were encountered: