Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce health sidecar to better integrate with PDB #948

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

swoehrl-mw
Copy link
Collaborator

@swoehrl-mw swoehrl-mw commented Jan 27, 2025

Description

Even with Pod Disruption Budgets (PDB) configured for opensearch nodepools, there can still be Opensearch cluster downtime during Kubernetes node replacements. This is because PDB only checks for pod readiness and pod readiness does not reflect the state of the Opensearch cluster (green/yellow/red), which is by design as readiness is a per-pod state and cluster health is for the entire cluster. So it can happen that Opensearch is still replicating and syncing shards and PDB allows replacing a Kubernetes node, leading to indices becoming unavailable.

This PR introduces the operator sidecar. When enabled it will elect a leader between all pods of an Opensearch cluster and that leader will reflect the actual Opensearch cluster state (green/yellow/red) in its readiness probe. That way if Opensearch is not green, one pod will be marked as not ready. If PDB is configured with maxUnavailable=1 this will block further Kubernetes node replacements until Openseach is green again.

This feature is experimental, so is off by default.

During implementation some other changes were needed:

  • During initial bootstrap all pods now start at the same time if the sidecar is enabled, but rolling restarts still happen pod-per-pod.
  • Reenabling shard allocation is now done sooner to prevent readiness getting stuck

Issues Resolved

N/A

Check List

  • Commits are signed per the DCO using --signoff
  • Unittest added for the new/changed functionality and all unit tests are successful
  • Customer-visible features documented
  • No linter warnings (make lint)

If CRDs are changed:

  • CRD YAMLs updated (make manifests) and also copied into the helm chart
  • Changes to CRDs documented

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@swoehrl-mw swoehrl-mw force-pushed the feature/health-sidecar branch 2 times, most recently from 71d6c0f to 28b4c36 Compare January 28, 2025 08:37
@swoehrl-mw swoehrl-mw marked this pull request as ready for review January 28, 2025 09:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

1 participant