Description
- Which image of the operator are you using?
ghcr.io/zalando/postgres-operator:v1.14.0
- Where do you run it - cloud or metal? Kubernetes or OpenShift?
Bare Metal K8s (local Kind cluster)
- Are you running Postgres Operator in production?
no
- Type of issue?
Bug report
-
Description
After creating a minimal PostgreSQL cluster using the Zalando Postgres Operator, I observed that two pods (acid-minimal-cluster-0 and acid-minimal-cluster-1) were both labeled as master after a network partition. This is unexpected behavior, as a PostgreSQL cluster should only have one master pod at any given time. -
Steps to Reproduce
- Create a Kind cluster:
kind create cluster --config cluster.yaml
- Load the Postgres Operator and Spilo images into the Kind cluster:
kind load docker-image ghcr.io/zalando/postgres-operator:v1.14.0
kind load docker-image ghcr.io/zalando/spilo-17:4.0-p2
- Deploy the Postgres Operator and create a minimal PostgreSQL cluster:
kubectl create -f manifests/configmap.yaml
kubectl create -f manifests/operator-service-account-rbac.yaml
kubectl create -f manifests/postgres-operator.yaml
kubectl create -f manifests/api-service.yaml
kubectl create -f manifests/minimal-postgres-manifest.yaml
- Verify the initial state of the cluster:
kubectl get pods -l application=spilo -L spilo-role
Output:
NAME READY STATUS RESTARTS AGE SPILO-ROLE
acid-minimal-cluster-0 1/1 Running 0 2m37s master
acid-minimal-cluster-1 1/1 Running 0 2m32s replica
- Simulate a network partition by disconnecting the network of the node running the master pod:
docker network disconnect kind <node-id>
- Check the state of the cluster again:
kubectl get pods -l application=spilo -L spilo-role
Output:
NAME READY STATUS RESTARTS AGE SPILO-ROLE
acid-minimal-cluster-0 1/1 Running 0 7m43s master
acid-minimal-cluster-1 1/1 Running 0 7m38s master
-
Expected Behavior
Only one pod should be labeled as master. The operator should handle the network partition by promoting a single replica to the master and ensuring the other pod remains a replica or is demoted. -
Actual Behavior
Both pods (acid-minimal-cluster-0 and acid-minimal-cluster-1) are labeled as master, leading to a potential connection issue when connecting Postgres through acid-minimal-cluster service.