Network Partition Causes Two Master Pods in Postgres Cluster

- **Which image of the operator are you using?**
```
ghcr.io/zalando/postgres-operator:v1.14.0
```
- **Where do you run it - cloud or metal? Kubernetes or OpenShift?**
```
Bare Metal K8s (local Kind cluster)
```
- **Are you running Postgres Operator in production?**
```
no
```
- **Type of issue?** 
```
Bug report
```

* Description
After creating a minimal PostgreSQL cluster using the Zalando Postgres Operator, I observed that two pods (acid-minimal-cluster-0 and acid-minimal-cluster-1) were both labeled as master after a network partition. This is unexpected behavior, as a PostgreSQL cluster should only have one master pod at any given time.

* Steps to Reproduce
1. Create a Kind cluster:
```
kind create cluster --config cluster.yaml
```

2. Load the Postgres Operator and Spilo images into the Kind cluster:

```
kind load docker-image ghcr.io/zalando/postgres-operator:v1.14.0
kind load docker-image ghcr.io/zalando/spilo-17:4.0-p2
```

3. Deploy the Postgres Operator and create a minimal PostgreSQL cluster:
```
kubectl create -f manifests/configmap.yaml
kubectl create -f manifests/operator-service-account-rbac.yaml
kubectl create -f manifests/postgres-operator.yaml
kubectl create -f manifests/api-service.yaml
kubectl create -f manifests/minimal-postgres-manifest.yaml
```

4. Verify the initial state of the cluster:
```
kubectl get pods -l application=spilo -L spilo-role
```
Output:
```
NAME                     READY   STATUS    RESTARTS   AGE     SPILO-ROLE
acid-minimal-cluster-0   1/1     Running   0          2m37s   master
acid-minimal-cluster-1   1/1     Running   0          2m32s   replica
```

5. Simulate a network partition by disconnecting the network of the node running the master pod:
```
docker network disconnect kind <node-id>
```

6. Check the state of the cluster again:

```
kubectl get pods -l application=spilo -L spilo-role
```
Output:
```
NAME                     READY   STATUS    RESTARTS   AGE     SPILO-ROLE
acid-minimal-cluster-0   1/1     Running   0          7m43s   master
acid-minimal-cluster-1   1/1     Running   0          7m38s   master
```

* Expected Behavior
Only one pod should be labeled as master. The operator should handle the network partition by promoting a single replica to the master and ensuring the other pod remains a replica or is demoted.

* Actual Behavior
Both pods (acid-minimal-cluster-0 and acid-minimal-cluster-1) are labeled as master, leading to a potential connection issue when connecting Postgres through acid-minimal-cluster service.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Network Partition Causes Two Master Pods in Postgres Cluster #2854

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Network Partition Causes Two Master Pods in Postgres Cluster #2854

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions