Support scale to zero rabbitMQ #1899

jonathanCaamano · 2025-06-26T13:47:16Z

This closes #1876

As talked with @Zerpet and @mkuratczyk in the issue we add some logic to allow scale to zero the rabbitMQ.

Also we add some logic to prevent the scale down when opt-out from zero.

We add new annotation rabbitmq.com/before-zero-replicas-configured to save the replicas configured before put rabbitMQ to zero.

With this annotation we verify if the desired replicas after zero state are equals or greater than replicas before zero state.
If the replicas don't pass the verification it will works like scaleDown.

Note to reviewers: remember to look at the commits in this PR and consider if they can be squashed

Summary Of Changes

Additional Context

Local Testing

Please ensure you run the unit, integration and system tests before approving the PR.

To run the unit and integration tests:

$ make unit-tests integration-tests

You will need to target a k8s cluster and have the operator deployed for running the system tests.

For example, for a Kubernetes context named dev-bunny:

$ kubectx dev-bunny
$ make destroy deploy-dev
# wait for operator to be deployed
$ make system-tests

mkuratczyk · 2025-07-02T07:30:38Z

Thanks for the PR. Just FYI, I will certainly test this soon, but need to finish some other things first

mkuratczyk · 2025-07-08T15:49:20Z

Some initial feedback:

ALLREPLICASREADY shows "true" when all replicas are stopped

# deploy a cluster, set replicas to 0, and then get the cluster:
> kubectl get rmq
NAME   ALLREPLICASREADY   RECONCILESUCCESS   AGE
rmq    True               True               13m

I think it should be set to False when scaled to 0.

Attempt to scale up from zero to a lower number of replicas than it was before scaling to zero, leads to an error:

2025-07-08T17:33:30+02:00	ERROR	Cluster Scale down not supported; tried to scale cluster from 3 nodes to 1 nodes	{"controller": "rabbitmqcluster", "controllerGroup": "rabbitmq.com", "controllerKind": "RabbitmqCluster", "RabbitmqCluster": {"name":"rmq","namespace":"default"}, "namespace": "default", "name": "rmq", "reconcileID": "338516a2-8aeb-447e-97fd-92e1774ae64d", "error": "UnsupportedOperation"}
github.com/rabbitmq/cluster-operator/v2/controllers.(*RabbitmqClusterReconciler).recordEventsAndSetCondition
	/Users/mkuratczyk/workspace/cluster-operator/controllers/reconcile_scale_zero.go:90
github.com/rabbitmq/cluster-operator/v2/controllers.(*RabbitmqClusterReconciler).scaleDownFromZero
	/Users/mkuratczyk/workspace/cluster-operator/controllers/reconcile_scale_zero.go:57
github.com/rabbitmq/cluster-operator/v2/controllers.(*RabbitmqClusterReconciler).Reconcile
	/Users/mkuratczyk/workspace/cluster-operator/controllers/rabbitmqcluster_controller.go:216
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile
	/Users/mkuratczyk/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:119
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler
	/Users/mkuratczyk/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:340
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem
	/Users/mkuratczyk/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:300
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.1
	/Users/mkuratczyk/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:202

(it is not expected to work as we discussed, but the stacktrace shouldn't be there, unless there's a good reason for it)

Attempt to scale from zero up to a number of replicas higher than before scaling down to zero works, which surprised me:
steps: deploy a 3 node cluster, set replicas to 0, then set replicas to 5. I don't see any reason for this to cause problems on 4.1+ thanks to the new peer discovery mechanism, but I guess it could cause issues with older RabbitMQ versions. Not sure what to do about this one yet. Perhaps we should keep it like that and just warn that using with older RabbitMQ versions is risky

jonathanCaamano · 2025-07-10T08:32:58Z

Hello @mkuratczyk,

Sure, I do some changes to have ALLREPLICASREADY as false
About this, I follow the same flow as a scale down just defined in the code ( because if the replicas before configured are 3 and now you try to put 1 it represents a scale down). So, if you'd like, I can change it and remove the stack trace.
About the RabbitMQ versions, the version in my cluster is 3.13 and it's working properly Maybe you see something I don't?

Thank you for the feedback

mkuratczyk · 2025-07-10T09:05:38Z

If there's a stack trace when a scale down is attempted (without scaling down to zero) then I think ideally we should just fix that for both cases. Alternatively, you can ignore it and we can deal with this separately.
I'm not saying it will never work, more that it could lead to random problems. Say we have 1 node, scale to zero and then scale to 3. What if the two new nodes start first for some reason? I think they could form a new cluster, at least in some cases. With 4.1+, that should not happen, since all nodes will wait for the node/pod with -0 suffix:
https://www.rabbitmq.com/blog/2025/04/04/new-k8s-peer-discovery

jonathanCaamano · 2025-07-14T10:08:37Z

Hello @mkuratczyk !

I did some change.

1- Now the ALLREPLICASREADY is false when is scaled to zero.
2- I tried to change this, but it should be analyzed and maybe change the way in the global logger.
3- About this, we change the way, now when you scale the rabbitMQ from zero have to be the same replicas than before zero, if you want scale up first have to put replicas before zero configured. This avoid the problems you told us, always respect the annotation.

Kind regards

mkuratczyk · 2025-07-17T12:37:56Z

Thanks. My only additional feedback is that the error message is a bit cryptic ("Cluster Scale from zero to other replicas than before configured not supported; tried to scale cluster from 3 nodes to 5 nodes"). Perhaps "unsupported operation: when scaling from zero, you can only restore the previous number of replicas (3)"?

@Zerpet @ansd @MirahImage any thoughts about this PR?

…b.com/InditexTech/cluster-operator into rabbitmqGH-1876-support-scale-to-zero

jonathanCaamano · 2025-07-18T10:19:51Z

Hello,

i changed the logger.

Zerpet

Thank you for contributing this PR! I left some comments with feedback that I would like to be addressed before merging.

controllers/reconcile_scale_zero.go

controllers/rabbitmqcluster_controller.go

controllers/reconcile_scale_zero.go

Zerpet

This is looking good 👍 I'm going to do some manual QA and I will merge afterwards. Thank you!

jonathanCaamano · 2025-07-30T11:08:01Z

Thanks!
I`ll wait for your QA feedback.

Zerpet

Nothing to report from the QA. It all worked as expected. Perhaps some may find surprising that AllReplicasReady condition is set to false when scaled to zero, but I'm not against this behaviour.

Contributed in rabbitmq/cluster-operator#1899

jonathanCaamano and others added 2 commits June 26, 2025 14:48

Support scale to zero rabbitMQ

2a2465d

Merge branch 'main' into rabbitmqGH-1876-support-scale-to-zero

ed8f4c5

Merge branch 'main' into rabbitmqGH-1876-support-scale-to-zero

45541d0

Merge branch 'main' into rabbitmqGH-1876-support-scale-to-zero

95c614a

jonathanCaamano and others added 2 commits July 14, 2025 12:04

Merge branch 'main' into rabbitmqGH-1876-support-scale-to-zero

0a57c86

feat: update code to align with pr comments

cc9cc6a

jonathanCaamano added 2 commits July 14, 2025 12:53

Merge branch 'main' into rabbitmqGH-1876-support-scale-to-zero

52b55b7

Merge branch 'main' into rabbitmqGH-1876-support-scale-to-zero

035aca0

jonathanCaamano added 3 commits July 18, 2025 12:02

feat: change logger

535851e

Merge branch 'rabbitmqGH-1876-support-scale-to-zero' of https://githu…

8d5d81d

…b.com/InditexTech/cluster-operator into rabbitmqGH-1876-support-scale-to-zero

fix: fix error on test

71e14c4

Zerpet self-requested a review July 18, 2025 12:08

Zerpet requested changes Jul 18, 2025

View reviewed changes

jonathanCaamano and others added 3 commits July 29, 2025 09:39

Merge branch 'main' into rabbitmqGH-1876-support-scale-to-zero

1037a6a

feat: add changes asked in pr

aae58d2

feat: add changes asked in pr

38aaf68

jonathanCaamano requested a review from Zerpet July 30, 2025 07:38

Zerpet reviewed Jul 30, 2025

View reviewed changes

Zerpet approved these changes Jul 30, 2025

View reviewed changes

Zerpet modified the milestones: 2.15.0, 2.16.0 Jul 30, 2025

Zerpet merged commit ee0f974 into rabbitmq:main Jul 30, 2025
13 checks passed

Zerpet added a commit to Zerpet/rabbitmq-website that referenced this pull request Aug 5, 2025

Document scale to zero in Kubernetes operator

545be7e

Contributed in rabbitmq/cluster-operator#1899

Zerpet mentioned this pull request Aug 5, 2025

Document scale to zero feature rabbitmq/rabbitmq-website#2327

Merged

Support scale to zero rabbitMQ #1899

Support scale to zero rabbitMQ #1899

Uh oh!

Conversation

jonathanCaamano commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary Of Changes

Additional Context

Local Testing

Uh oh!

mkuratczyk commented Jul 2, 2025

Uh oh!

mkuratczyk commented Jul 8, 2025

Uh oh!

jonathanCaamano commented Jul 10, 2025

Uh oh!

mkuratczyk commented Jul 10, 2025

Uh oh!

jonathanCaamano commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mkuratczyk commented Jul 17, 2025

Uh oh!

jonathanCaamano commented Jul 18, 2025

Uh oh!

Zerpet left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Zerpet left a comment

Choose a reason for hiding this comment

Uh oh!

jonathanCaamano commented Jul 30, 2025

Uh oh!

Zerpet left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jonathanCaamano commented Jun 26, 2025 •

edited

Loading

jonathanCaamano commented Jul 14, 2025 •

edited

Loading