Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network Partition Causes Two Master Pods in Postgres Cluster #2854

Open
SuJinpei opened this issue Jan 26, 2025 · 0 comments
Open

Network Partition Causes Two Master Pods in Postgres Cluster #2854

SuJinpei opened this issue Jan 26, 2025 · 0 comments

Comments

@SuJinpei
Copy link

  • Which image of the operator are you using?
ghcr.io/zalando/postgres-operator:v1.14.0
  • Where do you run it - cloud or metal? Kubernetes or OpenShift?
Bare Metal K8s (local Kind cluster)
  • Are you running Postgres Operator in production?
no
  • Type of issue?
Bug report
  • Description
    After creating a minimal PostgreSQL cluster using the Zalando Postgres Operator, I observed that two pods (acid-minimal-cluster-0 and acid-minimal-cluster-1) were both labeled as master after a network partition. This is unexpected behavior, as a PostgreSQL cluster should only have one master pod at any given time.

  • Steps to Reproduce

  1. Create a Kind cluster:
kind create cluster --config cluster.yaml
  1. Load the Postgres Operator and Spilo images into the Kind cluster:
kind load docker-image ghcr.io/zalando/postgres-operator:v1.14.0
kind load docker-image ghcr.io/zalando/spilo-17:4.0-p2
  1. Deploy the Postgres Operator and create a minimal PostgreSQL cluster:
kubectl create -f manifests/configmap.yaml
kubectl create -f manifests/operator-service-account-rbac.yaml
kubectl create -f manifests/postgres-operator.yaml
kubectl create -f manifests/api-service.yaml
kubectl create -f manifests/minimal-postgres-manifest.yaml
  1. Verify the initial state of the cluster:
kubectl get pods -l application=spilo -L spilo-role

Output:

NAME                     READY   STATUS    RESTARTS   AGE     SPILO-ROLE
acid-minimal-cluster-0   1/1     Running   0          2m37s   master
acid-minimal-cluster-1   1/1     Running   0          2m32s   replica
  1. Simulate a network partition by disconnecting the network of the node running the master pod:
docker network disconnect kind <node-id>
  1. Check the state of the cluster again:
kubectl get pods -l application=spilo -L spilo-role

Output:

NAME                     READY   STATUS    RESTARTS   AGE     SPILO-ROLE
acid-minimal-cluster-0   1/1     Running   0          7m43s   master
acid-minimal-cluster-1   1/1     Running   0          7m38s   master
  • Expected Behavior
    Only one pod should be labeled as master. The operator should handle the network partition by promoting a single replica to the master and ensuring the other pod remains a replica or is demoted.

  • Actual Behavior
    Both pods (acid-minimal-cluster-0 and acid-minimal-cluster-1) are labeled as master, leading to a potential connection issue when connecting Postgres through acid-minimal-cluster service.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant