Skip to content

docs: document Pod disruption budget configuration#213

Open
arekborucki wants to merge 8 commits into
ClickHouse:mainfrom
arekborucki:docs/pod-disruption-budget
Open

docs: document Pod disruption budget configuration#213
arekborucki wants to merge 8 commits into
ClickHouse:mainfrom
arekborucki:docs/pod-disruption-budget

Conversation

@arekborucki
Copy link
Copy Markdown
Contributor

@arekborucki arekborucki commented Jun 2, 2026

Adds a "Pod disruption budgets" section to the configuration guide between Pod configuration and Container configuration. The section covers spec.podDisruptionBudget on both ClickHouseCluster and KeeperCluster CRDs:

  • Operator defaults (per-shard for ClickHouseCluster: maxUnavailable=1 for single-replica shards, minAvailable=1 for multi-replica shards; KeeperCluster: maxUnavailable=replicas/2 to preserve RAFT quorum)
  • minAvailable vs maxUnavailable overrides with a webhook note that setting both is rejected
  • The Enabled/Disabled/Ignored policy and when each fits
  • Cluster-wide ENABLE_PDB env var on the operator Deployment for environments that ship their own disruption policies

No code or chart changes. documentation only.

Why

The operator automatically creates a PodDisruptionBudget for every ClickHouseCluster shard and KeeperCluster, applying defaults that protect quorum during voluntary disruptions. Despite being a core operational feature, PDB behavior is currently undocumented.
This leaves users unaware of automatically managed PDBs until they encounter them in production and forces operators to inspect implementation details in api/v1alpha1/common.go to understand the available Enabled, Disabled, and Ignored policies, as well as the cluster-wide ENABLE_PDB toggle.

What

Adds a new ## Pod disruption budgets section to docs/guides/configuration.mdx, placed between Pod configuration and Container configuration.

The new section covers:

  • Defaults: explains the operator-managed PodDisruptionBudget defaults:

    • ClickHouseCluster: one PDB per shard, using maxUnavailable: 1 for single-replica shards and minAvailable: 1 for multi-replica shards.
    • KeeperCluster: maxUnavailable: replicas/2, preserving RAFT quorum in a 2F+1 deployment.
  • Overriding the defaults: documents minAvailable and maxUnavailable overrides, including a warning that specifying both fields is rejected by the validating webhook.

  • Policies: explains the Enabled, Disabled, and Ignored policies, with YAML examples and guidance on when each option should be used.

  • Cluster-wide opt-out: documents the ENABLE_PDB environment variable on the operator Deployment, which disables automatic PDB management for environments that provide their own disruption policies.

No code, API, or chart changes. Documentation only.

arekborucki and others added 4 commits June 2, 2026 20:36
Adds a "Pod disruption budgets" section to the configuration guide
between Pod configuration and Container configuration. The section
covers spec.podDisruptionBudget on both ClickHouseCluster and
KeeperCluster CRDs:

- Operator defaults (per-shard for ClickHouseCluster:
  maxUnavailable=1 for single-replica shards, minAvailable=1 for
  multi-replica shards; KeeperCluster: maxUnavailable=replicas/2 to
  preserve RAFT quorum)
- minAvailable vs maxUnavailable overrides with a webhook note that
  setting both is rejected
- The Enabled/Disabled/Ignored policy and when each fits
- Cluster-wide ENABLE_PDB env var on the operator Deployment for
  environments that ship their own disruption policies

No code or chart changes — documentation only.
Vale flagged the PDB guide for 'autoscaler' as a spelling error.
 Add it together with related Kubernetes ecosystem terms that the new section uses (GitOps, Gatekeeper, Kyverno, NotReady) so the next round of the docs guide does not trip on them either.
Comment thread docs/guides/configuration.mdx Outdated
|---|---|---|
| `ClickHouseCluster` | `replicas: 1` (single-replica shard) | `maxUnavailable: 1` — disruption is allowed because there is nothing to preserve anyway |
| `ClickHouseCluster` | `replicas: 2+` (multi-replica shard) | `minAvailable: 1` — at least one replica per shard must stay up |
| `KeeperCluster` | any | `maxUnavailable: replicas/2` — preserves the RAFT quorum for a `2F+1` cluster (3 replicas tolerate 1 down, 5 replicas tolerate 2 down) |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recently, it was updated and now with a single-node keeper, it also allows for disrupting
18f10ea

Copy link
Copy Markdown
Contributor Author

@arekborucki arekborucki Jun 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the defaults table now distinguishes replicas: 1 (maxUnavailable: 1 from #208) from replicas: 3+ (maxUnavailable: replicas/2)

Comment thread docs/guides/configuration.mdx Outdated

### Cluster-wide opt-out {#pdb-cluster-wide-disable}

PDB management can also be disabled cluster-wide via the operator's `ENABLE_PDB` environment variable. With `ENABLE_PDB=false`, the operator skips the PDB reconcile step for **every** ClickHouseCluster and KeeperCluster regardless of their `spec.podDisruptionBudget.policy`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also won't watch PDP resources, so the operator would work correctly even if the SA doesn't have any permissions regarding PDP

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Updated the "Cluster-wide opt-out" paragraph to make this explicit: with ENABLE_PDB=false the operator does not watch PodDisruptionBudget resources at all, and the SA does not need RBAC permissions on poddisruptionbudgets.policy/v1.

@GrigoryPervakov GrigoryPervakov enabled auto-merge (squash) June 3, 2026 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants