Introduce health sidecar to better integrate with PDB #948
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Even with Pod Disruption Budgets (PDB) configured for opensearch nodepools, there can still be Opensearch cluster downtime during Kubernetes node replacements. This is because PDB only checks for pod readiness and pod readiness does not reflect the state of the Opensearch cluster (green/yellow/red), which is by design as readiness is a per-pod state and cluster health is for the entire cluster. So it can happen that Opensearch is still replicating and syncing shards and PDB allows replacing a Kubernetes node, leading to indices becoming unavailable.
This PR introduces the operator sidecar. When enabled it will elect a leader between all pods of an Opensearch cluster and that leader will reflect the actual Opensearch cluster state (green/yellow/red) in its readiness probe. That way if Opensearch is not green, one pod will be marked as not ready. If PDB is configured with maxUnavailable=1 this will block further Kubernetes node replacements until Openseach is green again.
This feature is experimental, so is off by default.
During implementation some other changes were needed:
Issues Resolved
N/A
Check List
make lint
)If CRDs are changed:
make manifests
) and also copied into the helm chartBy submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.