Kubernetes: access to vhost '/' refused for user '': vhost '/' is down #14208

bdoublet91 · 2025-07-10T10:45:20Z

bdoublet91
Jul 10, 2025

Describe the bug

Hi,
We are using rabbitmq with celery in production within a docker swarm cluster container orchestration platform since 6 years.
Since few month, we are facing some crash with rabbitmq.

[error] <0.7075.0> Error on AMQP connection <0.7075.0> (10.0.12.105:33414 -> 10.0.12.122:5672, vhost: 'none', user: 'guest', state: opening), channel 0:
[error] <0.7075.0>  {handshake_error,opening,
[error] <0.7075.0>                  {amqp_error,internal_error,
[error] <0.7075.0>                              "access to vhost '/' refused for user 'guest': vhost '/' is down",
[error] <0.7075.0>                              'connection.open'}}

[info] <0.441.0> Message store "628WB79CIFDYO9LJI6DKMI09L/msg_store_transient": using rabbit_msg_store_ets_index to provide index
[error] <0.441.0>   crasher:
[error] <0.441.0>     initial call: rabbit_msg_store:init/1
[error] <0.441.0>     pid: <0.441.0>
[error] <0.441.0>     registered_name: []
[error] <0.441.0>     exception exit: {{badmatch,
[error] <0.441.0>                          {error,
[error] <0.441.0>                              {"/var/lib/rabbitmq/mnesia/rabbit@dashboard_rabbit/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L/msg_store_transient",
[error] <0.441.0>                               eexist}}},
[error] <0.441.0>                      [{rabbit_msg_store,init,1,
[error] <0.441.0>                           [{file,"rabbit_msg_store.erl"},{line,732}]},
[error] <0.441.0>                       {gen_server2,init_it,6,
[error] <0.441.0>                           [{file,"gen_server2.erl"},{line,565}]},
[error] <0.441.0>                       {proc_lib,init_p_do_apply,3,
[error] <0.441.0>                           [{file,"proc_lib.erl"},{line,240}]}]}
[error] <0.441.0>       in function  gen_server2:init_it/6 (gen_server2.erl, line 608)
[error] <0.441.0>     ancestors: [<0.435.0>,<0.434.0>,rabbit_vhost_sup_sup,rabbit_sup,
[error] <0.441.0>                   <0.229.0>]
[error] <0.441.0>     message_queue_len: 0
[error] <0.441.0>     messages: []
[error] <0.441.0>     links: [<0.435.0>]
[error] <0.441.0>     dictionary: []
[error] <0.441.0>     trap_exit: true
[error] <0.441.0>     status: running
[error] <0.441.0>     heap_size: 1598
[error] <0.441.0>     stack_size: 28
[error] <0.441.0>     reductions: 3656
[error] <0.441.0>   neighbours:
[error] <0.441.0> 
[error] <0.436.0> Failed to start message store of type msg_store_transient for vhost '/': {{{badmatch,
[error] <0.436.0>                                                                             {error,
[error] <0.436.0>                                                                              {"/var/lib/rabbitmq/mnesia/rabbit@dashboard_rabbit/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L/msg_store_transient",
[error] <0.436.0>                                                                               eexist}}},
[error] <0.436.0>                                                                            [{rabbit_msg_store,
[error] <0.436.0>                                                                              init,
[error] <0.436.0>                                                                              1,
[error] <0.436.0>                                                                              [{file,
[error] <0.436.0>                                                                                "rabbit_msg_store.erl"},
[error] <0.436.0>                                                                               {line,
[error] <0.436.0>                                                                                732}]},
[error] <0.436.0>                                                                             {gen_server2,
[error] <0.436.0>                                                                              init_it,
[error] <0.436.0>                                                                              6,
[error] <0.436.0>                                                                              [{file,
[error] <0.436.0>                                                                                "gen_server2.erl"},
[error] <0.436.0>                                                                               {line,
[error] <0.436.0>                                                                                565}]},
[error] <0.436.0>                                                                             {proc_lib,
[error] <0.436.0>                                                                              init_p_do_apply,
[error] <0.436.0>                                                                              3,
[error] <0.436.0>                                                                              [{file,
[error] <0.436.0>                                                                                "proc_lib.erl"},
[error] <0.436.0>                                                                               {line,
[error] <0.436.0>                                                                                240}]}]},
[error] <0.436.0>                                                                           {child,
[error] <0.436.0>                                                                            undefined,
[error] <0.436.0>                                                                            msg_store_transient,
[error] <0.436.0>                                                                            {rabbit_msg_store,
[error] <0.436.0>                                                                             start_link,
[error] <0.436.0>                                                                             [<<"/">>,
[error] <0.436.0>                                                                              msg_store_transient,
[error] <0.436.0>                                                                              "/var/lib/rabbitmq/mnesia/rabbit@dashboard_rabbit/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L",
[error] <0.436.0>                                                                              undefined,
[error] <0.436.0>                                                                              {#Fun<rabbit_variable_queue.0.61272185>,
[error] <0.436.0>                                                                               ok}]},
[error] <0.436.0>                                                                            transient,
[error] <0.436.0>                                                                            600000,
[error] <0.436.0>                                                                            worker,
[error] <0.436.0>                                                                            [rabbit_msg_store]}}
[error] <0.436.0> Unable to recover vhost <<"/">> data. Reason {error,
[error] <0.436.0>                                               {{{badmatch,
[error] <0.436.0>                                                  {error,
[error] <0.436.0>                                                   {"/var/lib/rabbitmq/mnesia/rabbit@dashboard_rabbit/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L/msg_store_transient",
[error] <0.436.0>                                                    eexist}}},
[error] <0.436.0>                                                 [{rabbit_msg_store,init,1,
[error] <0.436.0>                                                   [{file,
[error] <0.436.0>                                                     "rabbit_msg_store.erl"},
[error] <0.436.0>                                                    {line,732}]},
[error] <0.436.0>                                                  {gen_server2,init_it,6,
[error] <0.436.0>                                                   [{file,"gen_server2.erl"},
[error] <0.436.0>                                                    {line,565}]},
[error] <0.436.0>                                                  {proc_lib,init_p_do_apply,3,
[error] <0.436.0>                                                   [{file,"proc_lib.erl"},
[error] <0.436.0>                                                    {line,240}]}]},
[error] <0.436.0>                                                {child,undefined,
[error] <0.436.0>                                                 msg_store_transient,
[error] <0.436.0>                                                 {rabbit_msg_store,start_link,
[error] <0.436.0>                                                  [<<"/">>,msg_store_transient,
[error] <0.436.0>                                                   "/var/lib/rabbitmq/mnesia/rabbit@dashboard_rabbit/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L",
[error] <0.436.0>                                                   undefined,
[error] <0.436.0>                                                   {#Fun<rabbit_variable_queue.0.61272185>,
[error] <0.436.0>                                                    ok}]},
[error] <0.436.0>                                                 transient,600000,worker,
[error] <0.436.0>                                                 [rabbit_msg_store]}}}
[error] <0.436.0>  Stacktrace [{rabbit_variable_queue,do_start_msg_store,4,
[error] <0.436.0>                                     [{file,"rabbit_variable_queue.erl"},
[error] <0.436.0>                                      {line,541}]},
[error] <0.436.0>              {rabbit_variable_queue,start_msg_store,3,
[error] <0.436.0>                                     [{file,"rabbit_variable_queue.erl"},
[error] <0.436.0>                                      {line,526}]},
[error] <0.436.0>              {rabbit_variable_queue,start,2,
[error] <0.436.0>                                     [{file,"rabbit_variable_queue.erl"},
[error] <0.436.0>                                      {line,517}]},
[error] <0.436.0>              {rabbit_priority_queue,start,2,
[error] <0.436.0>                                     [{file,"rabbit_priority_queue.erl"},
[error] <0.436.0>                                      {line,86}]},
[error] <0.436.0>              {rabbit_classic_queue,recover,2,
[error] <0.436.0>                                    [{file,"rabbit_classic_queue.erl"},
[error] <0.436.0>                                     {line,134}]},
[error] <0.436.0>              {timer,tc,3,[{file,"timer.erl"},{line,266}]},
[error] <0.436.0>              {rabbit_queue_type,'-recover/2-fun-3-',4,
[error] <0.436.0>                                 [{file,"rabbit_queue_type.erl"},{line,448}]},
[error] <0.436.0>              {maps,fold_1,3,[{file,"maps.erl"},{line,411}]}]
[error] <0.434.0>     supervisor: {<0.434.0>,rabbit_vhost_sup_wrapper}
[error] <0.434.0>     errorContext: start_error
[error] <0.434.0>     reason: {error,
[error] <0.434.0>                 {{{badmatch,
[error] <0.434.0>                       {error,
[error] <0.434.0>                           {"/var/lib/rabbitmq/mnesia/rabbit@dashboard_rabbit/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L/msg_store_transient",
[error] <0.434.0>                            eexist}}},
[error] <0.434.0>                   [{rabbit_msg_store,init,1,
[error] <0.434.0>                        [{file,"rabbit_msg_store.erl"},{line,732}]},
[error] <0.434.0>                    {gen_server2,init_it,6,
[error] <0.434.0>                        [{file,"gen_server2.erl"},{line,565}]},
[error] <0.434.0>                    {proc_lib,init_p_do_apply,3,
[error] <0.434.0>                        [{file,"proc_lib.erl"},{line,240}]}]},
[error] <0.434.0>                  {child,undefined,msg_store_transient,
[error] <0.434.0>                      {rabbit_msg_store,start_link,
[error] <0.434.0>                          [<<"/">>,msg_store_transient,
[error] <0.434.0>                           "/var/lib/rabbitmq/mnesia/rabbit@dashboard_rabbit/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L",
[error] <0.434.0>                           undefined,
[error] <0.434.0>                           {#Fun<rabbit_variable_queue.0.61272185>,ok}]},
[error] <0.434.0>                      transient,600000,worker,
[error] <0.434.0>                      [rabbit_msg_store]}}}
[error] <0.434.0>     offender: [{pid,undefined},
[error] <0.434.0>                {id,rabbit_vhost_process},
[error] <0.434.0>                {mfargs,{rabbit_vhost_process,start_link,[<<"/">>]}},
[error] <0.434.0>                {restart_type,permanent},
[error] <0.434.0>                {significant,false},
[error] <0.434.0>                {shutdown,300000},
[error] <0.434.0>                {child_type,worker}]
[error] <0.434.0>

We are using different version of rabbitmq: 4.1.0 and 3.13.1 and this problem happened in both version.

To recover rabbitmq, we need to delete the vhost data file in /var/lib/rabbitmq/mnesia/ as a workaround.

I find some issues related to this problem but no definitive solution, only this workaround.

It often happens when we update our application and rabbitmq restarts

 rabbitmq:
    image: registry.local:5000/test/docker-images/rabbitmq:3.13.1-management-alpine
    hostname: "dashboard_rabbit"
    healthcheck:
      test: ['CMD', 'rabbitmq-diagnostics', '-q', 'ping']
      interval: 60s
      timeout: 5s
      retries: 3
    environment:
      - RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS=-rabbit log_levels [{connection,error}]
      - RABBITMQ_NODENAME=rabbit@dashboard_rabbit
    networks:
      test-dashboard:
        aliases:
          - test-dashboard_rabbitmq
      traefik-local:
        aliases:
          - test-dashboard_rabbitmq
      supervision:
        aliases:
          - test-dashboard_rabbitmq
    extra_hosts:
      - "dashboard_rabbit:127.0.0.1"
    volumes:
      - /mnt/test-data/int/data/test/ins-dashboard/dashboard/2.7.0-rc1/11-default-dashboard.conf:/etc/rabbitmq/conf.d/11-default-dashboard.conf
      - /mnt/test-data/int/data/test/ins-dashboard/dashboard/2.7.0-rc1/rabbitmq/queues/:/var/lib/rabbitmq/mnesia/
    labels:
      - test.client=inf  
    deploy:
      update_config:
        order: start-first
        failure_action: rollback
        delay: 10s
      rollback_config:
        parallelism: 0
        order: stop-first
      restart_policy:
        condition: any
        delay: 5s
        max_attempts: 0
        window: 120s
      mode: replicated
      replicas: 1
      placement:
        constraints:
          - node.role==worker
          - node.labels.infra == ovh
          - node.labels.swarmtype == swarmvip
      resources:
        limits:
          memory: 10gb
          cpus: '0'
        reservations:
          memory: 256M
          cpus: '0'
      labels:
        - traefik.enable=true
        - traefik.internal=true
        - traefik.http.routers.test_dashboard_xeyn5mx8efdq-rabbitmq-router.rule=Host("dashboard-rabbitmq.int.test.local")
        - traefik.http.routers.test_dashboard_xeyn5mx8efdq-rabbitmq-router.entrypoints=internalweb
        - traefik.http.services.test_dashboard_xeyn5mx8efdq-rabbitmq-services.loadbalancer.server.port=15672

Rabbitmq datas are located on a NFS and we didn't have any problem before

Reproduction steps

Restart rabbitmq with persistem volume storage
Rabbitmq refused new connection and say vhost / is down
Need to remove the vhost in data storage and restart rabbitmq
...

Expected behavior

Understand why it happens and how to fix it definitely

Additional context

Feel free to ask

Answered by michaelklishin

Jul 10, 2025

I recall discussing this with another core team member and our conclusion was the following: if the mounted volume is not yet ready for writes by the time the node boots, it will fail to seed the data and the virtual host then will fail to stop.

This is pretty clearly hinted at by one of the function names: rabbit_variable_queue:do_start_msg_store/4.

We have never seen this behavior outside of Kubernetes, and RabbitMQ nodes do not do anything creative when it comes to initializing the schema data store or the CQ message store. So there is nothing to "fix once and for all" in RabbitMQ.

A while ago we have considered adding optional delays before and after the node boots, for very different…

View full answer

michaelklishin · 2025-07-10T12:01:43Z

michaelklishin
Jul 10, 2025
Maintainer

RabbitMQ 3.13 has been out of community support for close to a year now.

Duplicate of #10052. Time to upgrade to 4.1.x, the only series covered by community support.

0 replies

michaelklishin · 2025-07-10T12:32:47Z

michaelklishin
Jul 10, 2025
Maintainer

I recall discussing this with another core team member and our conclusion was the following: if the mounted volume is not yet ready for writes by the time the node boots, it will fail to seed the data and the virtual host then will fail to stop.

This is pretty clearly hinted at by one of the function names: rabbit_variable_queue:do_start_msg_store/4.

We have never seen this behavior outside of Kubernetes, and RabbitMQ nodes do not do anything creative when it comes to initializing the schema data store or the CQ message store. So there is nothing to "fix once and for all" in RabbitMQ.

A while ago we have considered adding optional delays before and after the node boots, for very different reasons. The former might help here. Or you can inject a startup pause using a Kubernetes-specific method, e.g. an init container that would verify that the volume is ready (writeable).

To our knowledge, this behavior was never reported by those who use our Kubernetes Cluster Operator. Most likely because it introduces a startup delay to work around a widely known unfortunate CoreDNS caching behavior/default.

So, a similar delay will likely help with volumes not being ready early enough.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Kubernetes: access to vhost '/' refused for user '': vhost '/' is down #14208

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Kubernetes: access to vhost '/' refused for user '': vhost '/' is down #14208

Uh oh!

Uh oh!

bdoublet91 Jul 10, 2025

Describe the bug

Reproduction steps

Expected behavior

Additional context

Replies: 2 comments

Uh oh!

michaelklishin Jul 10, 2025 Maintainer

Uh oh!

Uh oh!

michaelklishin Jul 10, 2025 Maintainer

bdoublet91
Jul 10, 2025

michaelklishin
Jul 10, 2025
Maintainer

michaelklishin
Jul 10, 2025
Maintainer