docs: document Pod disruption budget configuration by arekborucki · Pull Request #213 · ClickHouse/clickhouse-operator

arekborucki · 2026-06-02T18:39:33Z

Adds a "Pod disruption budgets" section to the configuration guide between Pod configuration and Container configuration. The section covers spec.podDisruptionBudget on both ClickHouseCluster and KeeperCluster CRDs:

Operator defaults (per-shard for ClickHouseCluster: maxUnavailable=1 for single-replica shards, minAvailable=1 for multi-replica shards; KeeperCluster: maxUnavailable=replicas/2 to preserve RAFT quorum)
minAvailable vs maxUnavailable overrides with a webhook note that setting both is rejected
The Enabled/Disabled/Ignored policy and when each fits
Cluster-wide ENABLE_PDB env var on the operator Deployment for environments that ship their own disruption policies

No code or chart changes. documentation only.

Why

The operator automatically creates a PodDisruptionBudget for every ClickHouseCluster shard and KeeperCluster, applying defaults that protect quorum during voluntary disruptions. Despite being a core operational feature, PDB behavior is currently undocumented.
This leaves users unaware of automatically managed PDBs until they encounter them in production and forces operators to inspect implementation details in api/v1alpha1/common.go to understand the available Enabled, Disabled, and Ignored policies, as well as the cluster-wide ENABLE_PDB toggle.

What

Adds a new ## Pod disruption budgets section to docs/guides/configuration.mdx, placed between Pod configuration and Container configuration.

The new section covers:

Defaults: explains the operator-managed PodDisruptionBudget defaults:
- ClickHouseCluster: one PDB per shard, using maxUnavailable: 1 for single-replica shards and minAvailable: 1 for multi-replica shards.
- KeeperCluster: maxUnavailable: replicas/2, preserving RAFT quorum in a 2F+1 deployment.
Overriding the defaults: documents minAvailable and maxUnavailable overrides, including a warning that specifying both fields is rejected by the validating webhook.
Policies: explains the Enabled, Disabled, and Ignored policies, with YAML examples and guidance on when each option should be used.
Cluster-wide opt-out: documents the ENABLE_PDB environment variable on the operator Deployment, which disables automatic PDB management for environments that provide their own disruption policies.

No code, API, or chart changes. Documentation only.

Adds a "Pod disruption budgets" section to the configuration guide between Pod configuration and Container configuration. The section covers spec.podDisruptionBudget on both ClickHouseCluster and KeeperCluster CRDs: - Operator defaults (per-shard for ClickHouseCluster: maxUnavailable=1 for single-replica shards, minAvailable=1 for multi-replica shards; KeeperCluster: maxUnavailable=replicas/2 to preserve RAFT quorum) - minAvailable vs maxUnavailable overrides with a webhook note that setting both is rejected - The Enabled/Disabled/Ignored policy and when each fits - Cluster-wide ENABLE_PDB env var on the operator Deployment for environments that ship their own disruption policies No code or chart changes — documentation only.

Vale flagged the PDB guide for 'autoscaler' as a spelling error. Add it together with related Kubernetes ecosystem terms that the new section uses (GitOps, Gatekeeper, Kyverno, NotReady) so the next round of the docs guide does not trip on them either.

GrigoryPervakov · 2026-06-03T13:00:46Z

+|---|---|---|
+| `ClickHouseCluster` | `replicas: 1` (single-replica shard) | `maxUnavailable: 1` — disruption is allowed because there is nothing to preserve anyway |
+| `ClickHouseCluster` | `replicas: 2+` (multi-replica shard) | `minAvailable: 1` — at least one replica per shard must stay up |
+| `KeeperCluster` | any | `maxUnavailable: replicas/2` — preserves the RAFT quorum for a `2F+1` cluster (3 replicas tolerate 1 down, 5 replicas tolerate 2 down) |


Recently, it was updated and now with a single-node keeper, it also allows for disrupting
18f10ea

the defaults table now distinguishes replicas: 1 (maxUnavailable: 1 from #208) from replicas: 3+ (maxUnavailable: replicas/2)

GrigoryPervakov · 2026-06-03T13:03:18Z

+
+### Cluster-wide opt-out {#pdb-cluster-wide-disable}
+
+PDB management can also be disabled cluster-wide via the operator's `ENABLE_PDB` environment variable. With `ENABLE_PDB=false`, the operator skips the PDB reconcile step for **every** ClickHouseCluster and KeeperCluster regardless of their `spec.podDisruptionBudget.policy`.


It also won't watch PDP resources, so the operator would work correctly even if the SA doesn't have any permissions regarding PDP

Good catch. Updated the "Cluster-wide opt-out" paragraph to make this explicit: with ENABLE_PDB=false the operator does not watch PodDisruptionBudget resources at all, and the SA does not need RBAC permissions on poddisruptionbudgets.policy/v1.

… KeeperCluster

arekborucki and others added 4 commits June 2, 2026 20:36

Merge branch 'main' into docs/pod-disruption-budget

a8f2b24

docs: spell out 'development' to satisfy Vale

2024d80

GrigoryPervakov reviewed Jun 3, 2026

View reviewed changes

arekborucki added 2 commits June 3, 2026 15:10

docs(pdb): document smart-default maxUnavailable=1 for single-replica…

7cee153

… KeeperCluster

docs(pdb): note RBAC requirement is dropped when ENABLE_PDB=false

e85d0c4

arekborucki requested a review from GrigoryPervakov June 3, 2026 13:20

rephrase description, remove unneded commit reference

172b880

GrigoryPervakov approved these changes Jun 3, 2026

View reviewed changes

GrigoryPervakov enabled auto-merge (squash) June 3, 2026 21:12

Merge branch 'main' into docs/pod-disruption-budget

5a245c3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: document Pod disruption budget configuration#213

docs: document Pod disruption budget configuration#213
arekborucki wants to merge 8 commits into
ClickHouse:mainfrom
arekborucki:docs/pod-disruption-budget

arekborucki commented Jun 2, 2026 •

edited

Loading

Uh oh!

GrigoryPervakov Jun 3, 2026

Uh oh!

arekborucki Jun 3, 2026 •

edited

Loading

Uh oh!

GrigoryPervakov Jun 3, 2026

Uh oh!

arekborucki Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		### Cluster-wide opt-out {#pdb-cluster-wide-disable}

		PDB management can also be disabled cluster-wide via the operator's `ENABLE_PDB` environment variable. With `ENABLE_PDB=false`, the operator skips the PDB reconcile step for every ClickHouseCluster and KeeperCluster regardless of their `spec.podDisruptionBudget.policy`.

Conversation

arekborucki commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What

Uh oh!

GrigoryPervakov Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

arekborucki Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GrigoryPervakov Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

arekborucki Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

arekborucki commented Jun 2, 2026 •

edited

Loading

arekborucki Jun 3, 2026 •

edited

Loading