Skip to content

upgrade: document broker restarts during operator upgrades#1758

Open
david-yu wants to merge 2 commits into
mainfrom
dyu/operator-upgrade-broker-restart
Open

upgrade: document broker restarts during operator upgrades#1758
david-yu wants to merge 2 commits into
mainfrom
dyu/operator-upgrade-broker-restart

Conversation

@david-yu

@david-yu david-yu commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

What

Adds a Broker restarts during operator upgrades section to Upgrade the Redpanda Operator, plus a NOTE in the existing Operator-only upgrades section.

Why

Operator-only upgrades restart every broker Pod even when the Redpanda version doesn't change — the operator re-renders the broker StatefulSet, and the configurator init + sidecar image references it injects into the broker Pod template track the operator version. Users reasonably expect an "operator-only" upgrade to be control-plane-only, so this catches them off guard during prod planning.

This came out of a support thread (USE2 prod, operator bump to v25.3.7 rolled every broker with no Redpanda version change) and was confirmed against the operator source + an end-to-end validation on Kubernetes 1.36 (upgrade from v26.1.4, both v1 Cluster and v2 Redpanda operators).

What the section covers

  • Expectation: plan for a rolling restart of every broker on each operator version bump.
  • Why: the configurator/sidecar image tag tracks the operator version (--configurator-tag defaults to the operator version), so an upgrade changes the broker Pod template → rolling restart. Also notes restarts happen when a release changes other parts of the Pod template (args/containers/mounts).
  • It's safe: maintenance mode + wait-for-healthy between brokers.
  • Planning: one cluster at a time, window sized by broker count, test in non-prod first.
  • Advanced staging option: pin the configurator/sidecar image via the chart's additionalCmdFlags so the control-plane upgrade lands without rolling brokers, then bump the pinned image in a separate window.

For reviewers — please sanity-check the advanced section

The "Stage the broker restart separately" subsection documents pinning --configurator-base-image/--configurator-tag. Two things to confirm before merge:

  1. Support stance. In the originating thread, the answer to "is there a way to upgrade without rolling brokers" was "not currently … no supported mechanism." The pin is real and works (validated), but if the operator team considers it unsupported, we may want to soften/remove the advanced subsection or label it explicitly as unsupported.
  2. Accuracy of caveats. I've scoped it carefully: the pin only avoids the roll when a release's sole broker-Pod-template change is the image tag (true for many patch releases, including the v25.3.7 case), and it temporarily runs a control-plane/sidecar version mismatch (warned against running long-term). Please confirm these match the operator team's guidance on control-plane↔sidecar version skew.

Happy to adjust framing (e.g., keep only the "expected + plan" guidance and drop the staging technique) based on the operator team's call.

Preview pages

🤖 Generated with Claude Code

Operator-only upgrades restart every broker Pod even when the Redpanda
version does not change, because the operator-injected configurator/sidecar
image references in the broker Pod template track the operator version. This
surprises users who expect an operator-only upgrade to be control-plane-only.

Add a "Broker restarts during operator upgrades" section that:
- sets the expectation (and adds a NOTE in the operator-only section),
- explains why it happens and that the rolling restart is safe (maintenance
  mode + wait-for-healthy between brokers),
- gives window-sizing/planning guidance,
- documents an advanced staging option: pin the configurator/sidecar image via
  the operator chart's additionalCmdFlags so the control-plane upgrade lands
  without rolling brokers, then bump the pinned image separately — with explicit
  caveats (only avoids the roll when the release's pod-template change is
  image-only; don't run a mismatched control-plane/sidecar long term).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@david-yu david-yu requested a review from a team as a code owner June 18, 2026 02:14
@netlify

netlify Bot commented Jun 18, 2026

Copy link
Copy Markdown

Deploy Preview for redpanda-docs-preview ready!

Name Link
🔨 Latest commit 7efc926
🔍 Latest deploy log https://app.netlify.com/projects/redpanda-docs-preview/deploys/6a3c6114030487fa8a3498f8
😎 Deploy Preview https://deploy-preview-1758--redpanda-docs-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ccce06cc-04c5-4660-8de4-a64f925a7c4a

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The operator upgrade guide (k-upgrade-operator.adoc) gains 51 new lines. A NOTE callout is inserted near the top of the page, stating that operator-only upgrades still trigger a rolling restart of broker Pods and pointing readers to a new section. That new section, "Broker restarts during operator upgrades," explains the mechanism (the Redpanda Operator injects version-tracked configurator init-container and sidecar image tags into the broker Pod template, causing a StatefulSet rollout), provides planning guidance (one cluster at a time, appropriately sized maintenance windows, non-production testing), and documents an advanced staging technique: pinning the configurator and sidecar images to the current operator version via additionalCmdFlags to defer broker restarts, along with associated limitations and safety cautions.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • redpanda-data/docs#1625: Updates operator-related upgrade/migration documentation to explain broker Pod rolling restarts triggered by operator-driven StatefulSet template changes, directly overlapping with the same topic covered in this PR.

Suggested reviewers

  • micheleRP
  • kbatuigas
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is detailed, but it doesn't follow the required template and omits the Jira ticket, review deadline, and checks list. Add the template's Description section with the Jira link and review deadline, plus the Checks checklist and any required page previews.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: documenting broker restarts during operator upgrades, which directly aligns with the primary purpose of the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dyu/operator-upgrade-broker-restart

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

Convert the operator-only-upgrade NOTE to block form, replace casual
"bump" wording with "upgrade"/"update", make the broker-restart
sequencing sentence active voice, and start the staging intro with an
imperative.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@micheleRP

Copy link
Copy Markdown
Contributor

Docs-team-standards review

Overall assessment: Strong, well-scoped addition. Clean structure, consistent with established conventions, technically careful. Style edits applied. No breaking issues. The one remaining item before merge is the support-stance question on the advanced section — already flagged by the author for reviewers.

What this PR does

Adds a NOTE to the existing Operator-only upgrades section plus a new H2 Broker restarts during operator upgrades section. It explains that operator-only upgrades roll every broker Pod (because the configurator/sidecar image tag tracks the operator version), confirms the roll is safe (maintenance mode + wait-for-healthy), gives window-sizing guidance, and documents an advanced staging technique to pin the configurator image via additionalCmdFlags. Audience: operators planning Kubernetes upgrades.

Jira ticket alignment

No linked Jira ticket. The PR body cites a support thread (USE2 prod, operator upgrade to v25.3.7) but no DOC-* key — worth confirming whether a tracking ticket should be linked.

Critical issues

None. No broken xrefs, no rendering issues, no terminology errors.

Key decision before merge (accuracy / SME sign-off)

  • [Stage the broker restart separately] The author has explicitly asked reviewers to confirm the support stance: the originating support thread answered "no supported mechanism" for upgrading without rolling brokers, yet this section documents a validated pin technique. Recommendation: get the operator team's call before merge; if it isn't officially supported, add an admonition labeling it a workaround rather than removing it. This is an accuracy/SME matter, not a docs-standards defect.

Style edits (applied in 7efc926d)

  1. ✅ Inline NOTE:[NOTE] block form (readability of the four-sentence note).
  2. ✅ Casual "bump" → "upgrade"/"update" (all prose instances).
  3. ✅ Passive → active: "Each broker is restarted sequentially and the operator waits..." → "The operator restarts each broker sequentially and waits...".
  4. ✅ Imperative opener: "If you want to apply an operator control-plane fix..." → "To apply an operator control-plane fix...".

Impact on other files

  • nav.adoc — No change needed; new section on an existing page already in the nav.
  • No contradicting statements — No existing page claims operator-only upgrades are control-plane-only / don't restart brokers, so nothing to reconcile.
  • additionalCmdFlags syntax is consistent — brace/comma form matches k-nodewatcher.adoc and k-decommission-brokers.adoc (manage/k8s: document decommission timing settings (--decommission-wait-interval, RequeueAfter) #1761).
  • What's New — Documents existing operator behavior, not a new feature; no release-notes entry required.

CodeRabbit findings

None — CodeRabbit posted no inline comments or review.

What works well

  • Technically deep and validated end-to-end (Kubernetes 1.36, both v1 Cluster and v2 Redpanda operators).
  • Honest, well-scoped caveats in the WARNING block (image-only-change limitation; control-plane/sidecar skew warning).
  • Clean sentence-case heading hierarchy with a correct internal anchor cross-link (<<broker-restarts,...>>); intro text precedes every subheading.
  • control plane (noun) vs control-plane (adjective), placeholders explained on first use, [source,bash] specifiers, spelled-out numbers — all correct.
  • Explains why the restart happens and that it's safe, answering the user's real concern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants