[Questions] Classic queue conversion takes very long time after upgrade to 4.0 #12848

gomoripeti · 2024-11-28T16:41:45Z

gomoripeti
Nov 28, 2024

Community Support Policy

I have read RabbitMQ's Community Support Policy
I agree to provide all relevant information (versions, logs, rabbitmq-diagnostics output, detailed reproduction steps)

RabbitMQ version used

4.0.3

Erlang version used

26.2.x

Operating system (distribution) used

linux

How is RabbitMQ deployed?

Debian package

rabbitmq-diagnostics status output

See https://www.rabbitmq.com/docs/cli to learn how to use rabbitmq-diagnostics

# PASTE OUTPUT HERE, BETWEEN BACKTICKS

Logs from node 1 (with sensitive values edited out)

See https://www.rabbitmq.com/docs/logging to learn how to collect logs

# PASTE LOG HERE, BETWEEN BACKTICKS

Logs from node 2 (if applicable, with sensitive values edited out)

See https://www.rabbitmq.com/docs/logging to learn how to collect logs

# PASTE LOG HERE, BETWEEN BACKTICKS

Logs from node 3 (if applicable, with sensitive values edited out)

See https://www.rabbitmq.com/docs/logging to learn how to collect logs

# PASTE LOG HERE, BETWEEN BACKTICKS

rabbitmq.conf

See https://www.rabbitmq.com/docs/configure#config-location to learn how to find rabbitmq.conf file location

# PASTE rabbitmq.conf HERE, BETWEEN BACKTICKS

Steps to deploy RabbitMQ cluster

Steps to reproduce the behavior in question

advanced.config

See https://www.rabbitmq.com/docs/configure#config-location to learn how to find advanced.config file location

# PASTE advanced.config HERE, BETWEEN BACKTICKS

Application code

# PASTE CODE HERE, BETWEEN BACKTICKS

Kubernetes deployment file

# Relevant parts of K8S deployment that demonstrate how RabbitMQ is deployed
# PASTE YAML HERE, BETWEEN BACKTICKS

What problem are you trying to solve?

A single node cluster was upgraded to 4.0.3 (in multiple steps from 3.12.6 -> 3.12.14 -> 3.13.7 -> 4.0.3)

During the startup after the upgrade the classic queues got converted from v1 to v2.

However some queues got into an infinite (or very long) loop.

2 queues were looping logging the below still after 50 minutes:

Queue <queue name> in vhost / converted 0 messages from v1 to v2

Before the upgrade there were less than 20 messages in queues and about 300 queues on the cluster.

observer output

|Home(H)|Network(N)|System(S)|Ets(E)|/Mnesia(M)|App(A)|Doc(D)|Plugin(P)|recon:proc_count(memory, 26) Interval:5000ms   | 0Days 00:48:28   |         15:27:59 [366/381]
|Erlang/OTP 26 [erts-14.2.5.4] [source] [64-bit] [smp:2:2] [ds:2:2:10] [async-threads:1] [jit:ns]                                         |
|System     | Count/Limit           | System                    | Status                | Stat Info            | Size                     |
|Proc Count | 903/1048576           | Version                   | 26                    | Active Task          | 4                        |
|Port Count | 10/131072             | ps -o pcpu                | 199%                  | Context Switch       | 19840812                 |
|Atom Count | 30877/5000000         | ps -o pmem                | 56.2%                 | Reds(Total/SinceLast | 97554927749/55691818     |
|Mem Type   | Size                  | Mem Type                  | Size                  | IO/GC:(5000ms)       | Total/Increments         |
|Total      | 3.9500 GB    | 100.0% | Binary                    | 1019.0313 KB | 00.02% | IO Output            | 11.1410 MB/6.2432 KB     |
|Process    | 3.9082 GB    | 98.94% | Code                      | 19.7016 MB   | 00.49% | IO Input             | 10.0040 MB/299 B         |
|Atom       | 990.8193 KB  | 00.02% | Port Parallelism (+spp)   | false                 | Gc Count             | 187128542/124            |
|Ets        | 4.7881 MB    | 00.12% | RunQueue                  | 1                     | Gc Words Reclaimed   | 299823719188/450450      |
|No | Pid        |     Memory   |Name or Initial Call                  |           Reductions| MsgQueue |Current Function                 |
|1  |<0.1262.0>  |    1.0231 GB |rabbit_amqqueue_process:init/1        |          10386865978| 1        |rabbit_variable_queue:convert_fro|
|2  |<0.1061.0>  |    1.0231 GB |rabbit_amqqueue_process:init/1        |          10393565347| 1        |rabbit_variable_queue:convert_fro|
|3  |<0.655.0>   |   51.7896 MB |rabbit_vhost_process:init/1           |               331244| 0        |gen_server2:mcall/1              |
|4  |<0.1584.0>  |   32.8130 MB |erlang:apply/2                        |                31861| 0        |gen_server2:collect_replies/4    |
|5  |<0.722.0>   |   27.3292 MB |rabbit_amqqueue_process:init/1        |            600553254| 1        |rabbit_variable_queue:convert_fro|
|6  |<0.1067.0>  |    1.7206 MB |rabbit_amqqueue_process:init/1        |           1678574009| 1        |rabbit_amqqueue_process:recovery_|
|7  |<0.50.0>    |    1.7201 MB |code_server                           |              1426026| 0        |code_server:loop/1               |
|8  |<0.346.0>   |    1.5012 MB |rabbit_ff_controller                  |               102531| 0        |gen_statem:loop_receive/3        |
|9  |<0.44.0>    |    1.0643 MB |application_controller                |              1396525| 0        |gen_server:loop/7                |
|10 |<0.1151.0>  |    1.0640 MB |rabbit_amqqueue_process:init/1        |           1109406307| 1        |rabbit_amqqueue_process:recovery_|
|11 |<0.737.0>   |    1.0640 MB |rabbit_amqqueue_process:init/1        |           1109045417| 1        |rabbit_amqqueue_process:recovery_|
|12 |<0.1040.0>  |    1.0639 MB |rabbit_amqqueue_process:init/1        |           1357488806| 1        |rabbit_amqqueue_process:recovery_|
|13 |<0.309.0>   |    1.0635 MB |mnesia_recover                        |                66735| 0        |gen_server:loop/7                |
|14 |<0.256.0>   |  951.1250 KB |rabbit                                |              1357973| 1        |gen:do_call/4                    |
|15 |<0.417.0>   |  741.8984 KB |vm_memory_monitor                     |             25586196| 0        |gen_server:loop/7                |
|16 |<0.1529.0>  |  417.0234 KB |rabbit_amqqueue_process:init/1        |            431830872| 1        |rabbit_amqqueue_process:recovery_|
|17 |<0.962.0>   |  417.0234 KB |rabbit_amqqueue_process:init/1        |            446093058| 1        |rabbit_amqqueue_process:recovery_|
|18 |<0.812.0>   |  417.0234 KB |rabbit_amqqueue_process:init/1        |            461068503| 1        |rabbit_amqqueue_process:recovery_|
|19 |<0.308.0>   |  278.0234 KB |mnesia_locker                         |                62115| 0        |mnesia_locker:loop/1             |
|20 |<0.1274.0>  |  258.2813 KB |rabbit_amqqueue_process:init/1        |            193333385| 1        |rabbit_amqqueue_process:recovery_|
|21 |<0.1010.0>  |  258.2813 KB |rabbit_amqqueue_process:init/1        |            264359738| 1        |rabbit_amqqueue_process:recovery_|
|22 |<0.971.0>   |  258.2813 KB |rabbit_amqqueue_process:init/1        |            280538343| 1        |rabbit_amqqueue_process:recovery_|
|23 |<0.10.0>    |  192.4766 KB |erl_prim_loader                       |              6617978| 0        |erl_prim_loader:loop/3           |
|24 |<0.9.0>     |  192.4219 KB |erlang:apply/2                        |                 9787| 0        |gen:do_call/4                    |
|25 |<0.1376.0>  |  172.6641 KB |rabbit_amqqueue_process:init/1        |             53366473| 1        |rabbit_amqqueue_process:recovery_|
|26 |<0.776.0>   |  172.5859 KB |rabbit_amqqueue_process:init/1        |             66749081| 1        |rabbit_amqqueue_process:recovery_|
|q(quit) p(pause) r/rr(reduction) m/mm(mem)b/bb(binary mem) t/tt(total heap size) mq/mmq(msg queue) 9(proc 9 info) F/B(page forward/back) |

This is the process info of the first queue. Called multiple times (I only repeated the relevant changing sections)
The stacktrace remained the same, always in delete_segment_file. However reductions were increasing (so process was not hanging) and the memory was going up and down a bit around 1 GB.

process info

1> recon:info(<0.1262.0>).
[{meta,[{registered_name,[]},
        {dictionary,[{'$initial_call',{rabbit_amqqueue_process,init,
                                                               1}},
                     {rand_seed,{#{max => 288230376151711743,type => exsplus,
                                   next => #Fun<rand.5.65977474>,jump => #Fun<rand.3.65977474>},
                                 [183854698317865182|176421344952987463]}},
                     {'$ancestors',[<0.1261.0>,<0.674.0>,<0.654.0>,<0.653.0>,
                                    rabbit_vhost_sup_sup,rabbit_sup,<0.256.0>]},
                     {process_name,{rabbit_amqqueue_process,{resource,<<"/">>,
                                                                      queue,<<"<QUEUE_NAME">>}}},
                     {guid,{{2668517449,3977234750,432613411,4154566358},0}},
                     {{"/var/lib/rabbitmq/mnesia/rabbit@<hostname>/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L/queues/6LOTSDPCKHP4M5JV3PEBOG2U1/journal.jif",
                       fhc_file},
                      {file,1,true}},
                     {segment_entry_count,2048},
                     {fhc_age_tree,{1,
                                    {{-576460742414322607,#Ref<0.417463204.737935362.235954>},
                                     true,nil,nil}}},
                     {{#Ref<0.417463204.737935362.235954>,fhc_handle},
                      {handle,{file_descriptor,prim_file,
                                               #{handle => #Ref<0.417463204.738066446.221027>,
                                                 owner => <0.1262.0>,
                                                 r_buffer => #Ref<0.417463204.738066434.235967>,
                                                 r_ahead_size => 0}},
                              #Ref<0.417463204.737935362.235954>,2311669,false,0,infinity,
                              [],<<>>,0,0,...}}]},
        {group_leader,<0.255.0>},
        {status,running}]},
 {signals,[{links,[<0.1261.0>]},
           {monitors,[]},
           {monitored_by,[#Ref<0.417463204.738066446.221027>,<0.422.0>,
                          <0.664.0>,<0.668.0>,<0.1584.0>]},
           {trap_exit,true}]},
{location,[{initial_call,{proc_lib,init_p,5}},
            {current_stacktrace,[{prim_file,delete,1,[]},
                                 {rabbit_queue_index,delete_segment_file_for_seq_id,2,
                                                     [{file,"rabbit_queue_index.erl"},{line,1399}]},
                                 {rabbit_variable_queue,convert_from_v1_to_v2_loop,8,
                                                        [{file,"rabbit_variable_queue.erl"},{line,890}]},
                                 {rabbit_classic_queue_index_v2,recover_index_v1_common,3,
                                                                [{file,"rabbit_classic_queue_index_v2.erl"},{line,499}]},
                                 {rabbit_classic_queue_index_v2,recover_index_v1_clean,6,
                                                                [{file,"rabbit_classic_queue_index_v2.erl"},{line,447}]},
                                 {rabbit_classic_queue_index_v2,recover,7,
                                                                [{file,"rabbit_classic_queue_index_v2.erl"},{line,243}]},
                                 {rabbit_variable_queue,init,5,
                                                        [{file,"rabbit_variable_queue.erl"},{line,457}]}]}]},
 {memory_used,[{memory,915465664},
               {message_queue_len,1},
               {heap_size,114432973},
               {total_heap_size,114432973},
               {garbage_collection,[{max_heap_size,#{error_logger => true,include_shared_binaries => false,
                                                     kill => true,size => 0}},
                                    {min_bin_vheap_size,46422},
                                    {min_heap_size,233},
                                    {fullsweep_after,65535},
                                    {minor_gcs,0}]}]},
 {work,[{reductions,10442674651}]}]

 {memory_used,[{memory,1098558416},
               {message_queue_len,1},
               {heap_size,137319567},
               {total_heap_size,137319567},
               {garbage_collection,[{max_heap_size,#{error_logger => true,include_shared_binaries => false,
                                                     kill => true,size => 0}},
                                    {min_bin_vheap_size,46422},
                                    {min_heap_size,233},
                                    {fullsweep_after,65535},
                                    {minor_gcs,0}]}]},
 {work,[{reductions,10567615142}]}]

 {memory_used,[{memory,1_098_558_416},
               {message_queue_len,1},
               {heap_size,137319567},
               {total_heap_size,137319567},

 {work,[{reductions,10576239189}]}]

 {memory_used,[{memory,915_465_664},
               {message_queue_len,1},
               {heap_size,114432973},
               {total_heap_size,114432973},

 {work,[{reductions,10601683335}]}]

 > recon_trace:calls({rabbit_queue_index,delete_segment_file_for_seq_id, 2}, 10, [{pid, <0.1262.0>},{scope,local}]).
1
*** ERROR: Shell process terminated! (^G to start new job) ***

Unfortunately the node went out-of-memory probably because of the tracing, after which the node started up successfully.

Before the OOM there was only a journal.jif in the queue dir of the 2 queues. All the entries in it are ACK-ed and DEL-ed.

I wonder if anything can be deduced from this much information about why the conversion took so long (and if it would have finished eventually)

mkuratczyk · 2024-11-29T09:09:58Z

mkuratczyk
Nov 29, 2024
Maintainer

For reference: this issue (I assume it's the same) was reported once before: https://groups.google.com/g/rabbitmq-users/c/8Ag1jnnLhWw/m/2ogIiQC5AgAJ.

Unfortunately we never managed to identify the root cause. I'm relatively sure the problem is that CQv1 had a bug (perhaps fixed years ago?) that under some circumstances left files with no valid messages behind. CQv1 queues affected by this issue might have no messages whatsoever, while still having lots and lots of files that the CQv1->CQv2 migration process needs to go through.

Was 3.12.6 the initial version for this environment or was it upgraded before, from some old 3.x versions? Given that we know about quite a lot of successful CQv1->CQv2 migrations (and obviously we ran lots of tests before shipping this), my assumption is that this CQv1 issue was likely fixed years ago (perhaps unknowingly) and only environments that used to run some old 3.x version that were upgraded many times over the years, could have queues with those unnecessary files. But if this can happen on 3.12, that'd be useful info.

What could be helpful, would be to try to identify affected queues (in this or other environments) that you have NOT yet tried to migrate and analyse why they have so many files, while they are still "happily" running CQv1. Would it be possible for you to try to find such queues? Basically we are looking for CQv1 queues that have a disproportionally large number of files relative to the number of messages ready in those queues. If you can find queues like that, can you find any commonalities between them (some particular policies or something)?

Perhaps if we understand what led to these files not being deleted, we could special-case them for the conversion?

1 reply

gomoripeti Nov 29, 2024
Author

thank you very much, Michal, for the great insights.

Indeed this cluster was upgraded from 3.10.5 to 3.12.6 within a month and originally created on 3.8.5. This cluster is already converted unfortunately (and nothing to see here) but I will try to find other clusters with unexpectedly large queue index dir.

From just the stacktrace it was not obvious to me if there were a lot of actual segment files on disk or only the in memory index had many entries marked as deleted so the migration just tried to delete non-existing files. From the logs it looked like the shutdowns were clean so recovery terms could survived.

What is surprising to me is the presence of the jif files. My understanding was that the jif files are flushed into the segment files during a clean shutdown.

We will look for some other clusters with CQv1 queues that are suspicious.

gomoripeti · 2025-03-27T16:44:35Z

gomoripeti
Mar 27, 2025
Author

We've seen this issue again and I managed to gather some evidence. Single node cluster (with many years of history) was upgraded from 3.13.7 to 4.0.7. One queue was taking forever to start during startup. After 2 hours the queue had an empty queue folder so I killed the queue process after which it started up fine.

Logs

2025-03-27 10:07:28.126526+00:00 [info] <0.111625.0> Converting queue <queuename> in vhost / from v1 to v2 after clean shutdown
...
2025-03-27 12:11:46.715960+00:00 [error] <0.292851.0> Restarting crashed queue '<queuename>' in vhost '/'.
2025-03-27 12:11:46.727732+00:00 [warning] <0.292851.0> Queue <queuename> in vhost / dropped 0/0/0 persistent messages and 0 transient messages after uncl
ean shutdown
2025-03-27 12:11:46.813116+00:00 [error] <0.43482.0> Queue <0.111625.0> failed to initialise: killed
2025-03-27 12:11:47.118239+00:00 [info] <0.43482.0> Recovering 49803 queues of type rabbit_classic_queue took 7473439ms

The queue was looping in rabbit_variable_queue:maybe_deltas_to_betas
A trace message reveals its current state:

11:50:15.882255 <0.111625.0> rabbit_variable_queue:maybe_deltas_to_betas(#Fun<rabbit_variable_queue.19.3630005>, {vqstate,{0,[],[]},
         {0,[],[]},
         {delta,12486113582,0,0,51703741769},
         {0,[],[]},
         {0,[],[]},
         51703741769,51703741769,#{},#{},undefined,undefined,
         {qi,{resource,<<"/">>,queue,<<"<queuename>">>},
             <<"/var/lib/rabbitmq/mnesia/rabbit@<hostname>/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L/queues/3V5Q2N3Y5O382ESC8FCMISNRU/">>,
             #{},0,#{},#{},#{},#{},#Fun<rabbit_variable_queue.2.3630005>,
             #Fun<rabbit_variable_queue.3.3630005>},
         {qs,<<"/var/lib/rabbitmq/mnesia/rabbit@<hostname>/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L/queues/3V5Q2N3Y5O382ESC8FCMISNRU/">>,
             undefined,64,#{},0,#{},undefined,undefined},
         {{client_msstate,<0.43495.0>,
                          <<175,225,14,45,221,131,200,135,82,127,129,242,168,
                            44,37,209>>,
                          undefined,#Ref<0.1980605388.2567569409.90216>,
                          <<"/var/lib/rabbitmq/mnesia/rabbit@<hostname>/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L/msg_store_persistent">>,
                          #Ref<0.1980605388.2567569409.90223>,
                          #Ref<0.1980605388.2567569409.90224>,
                          #Ref<0.1980605388.2567569409.90225>,
                          {4000,800}},
          {client_msstate,<0.43491.0>,
                          <<194,213,115,200,223,248,78,63,143,255,182,188,200,
                            230,242,141>>,
                          undefined,#Ref<0.1980605388.2567569409.90097>,
                          <<"/var/lib/rabbitmq/mnesia/rabbit@<hostname>/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L/msg_store_transient">>,
                          #Ref<0.1980605388.2567569409.90098>,
                          #Ref<0.1980605388.2567569409.90099>,
                          #Ref<0.1980605388.2567569409.90100>,
                          {4000,800}}},
         true,51703741769,4096,0,0,0,0,0,0,infinity,0,0,0,0,0,0,
         {rates,0.0,0.0,0.0,0.0,-576460622748023615},
         #{},#{},#{},#{},0,0,0,0,0,default,2,#{},<<"/">>,false},
         1, <-- MemoryLimit
         messages)

First I noticed the MemoryLimit argument is 1. From code comment "We allow from 1 to 2048 messages in memory depending on the consume rate." during startup there is per definition no consumption so this will be always 1. This results in a very slow iteration. (https://github.com/rabbitmq/rabbitmq-server/blob/v4.0.7/deps/rabbit/src/rabbit_variable_queue.erl#L2254)

Some important parts of the state pretty printed:

#vqstate{
  delta =
    #delta{start_seq_id = 12486113582,
           count = 0,
           transient = 0,
           end_seq_id = 51703741769}
...
  index_state =
    #qi{queue_name = #resource{virtual_host = <<"/">>, kind = queue,name = <<"<queuename>">>},
        ...
	segments = #{}
       }
...
  next_seq_id = 51703741769
}

The end_seq_id is 51 billion. What was in recovery terms can just be guessed indirectly, but probably the end_seq_id comes from there.

My assumption is that if there is a queue with only transient messages after a clean RabbitMQ shutdown and startup the transient messages and hence all segment files will be deleted but then rabbit_variable_queue::maybe_deltas_to_betas iterates over all the sequence ids from 0 to end_seq_id one by one (because MemoryLimit = 1) and this can take 10 hours in case of last index of 50 billion.

I was able to observe this behaviour on main as well. (Although with slight differences eg segments in index_state is not empty and next_deliver_seq_id not equal to next_seq_id)

If the queue is empty then next_seq_id is reset to 0, but if the queue was never totally empty and has a long history this looping can take considerable time.

I believe @lhoguin can confirm or deny this theory and would know the remedy immediately.

4 replies

michaelklishin Mar 27, 2025
Maintainer

We can potentially use a higher rate if the number of messages to convert is lower than N, possibly a configurable N. The default can be something like 5K messages?

lhoguin Mar 28, 2025
Maintainer

I need to think a bit more about it. We definitely don't want to just set the limit to 2048 and leave it at that, because that'll mean the queue will have up to 2048 messages in memory instead of the expected 1. But now that we are v2-only, we can very much read more then at the end of recovery drop all messages except 1. So I think that's the way to go to solve this. Set the limit to 2048 temporarily, set to 1 at the end, and drop the messages from q3 except the head of the queue.

gomoripeti Mar 28, 2025
Author

The MemoryLimit=1 is one thing but the other is why a lot of work is done during queue initialization when the queue is empty (or to be more precise there are no persistent messages in the queue, there might have been some transient messages in the queue but they are dropped at startup)

Some questions about some details in the code

Do I understand correctly that v1 to v2 conversion already happened when maybe_deltas_to_betas is called? (Yes it happens as part of rabbit_classic_queue_index_v2:recover/7 (https://github.com/rabbitmq/rabbitmq-server/blob/v4.0.7/deps/rabbit/src/rabbit_variable_queue.erl#L457) before init/9 is called)
When does dropping transient messages happen? Is it part of v1 to v2 conversion in case of an upgrade from 3.x? Or (in case just a restart on 4.x ie no v1 to v3 conversion) does maybe_deltas_to_betas also take part in dropping transient messages?
https://github.com/rabbitmq/rabbitmq-server/blob/v4.0.7/deps/rabbit/src/rabbit_variable_queue.erl#L1190
```
Delta = case DeltaCount1 == 0 andalso DeltaCount /= undefined of
```
What does ?BLANK_DELTA represent, is it that there are no messages on disk? Why this condition with DeltaCount /= undefined? In my case it was undefined because of a clean shutdown (https://github.com/rabbitmq/rabbitmq-server/blob/v4.0.7/deps/rabbit/src/rabbit_classic_queue_index_v2.erl#L250)
LowSeqId is zero because rabbit_classic_queue_index_v2:bounds returns 0 for lower and upper bounds if there are no segments.
https://github.com/rabbitmq/rabbitmq-server/blob/v4.0.7/deps/rabbit/src/rabbit_variable_queue.erl#L1175
comment in bounds says "We do not need to be accurate about these values", but in this case lower bound of zero is maybe a bit too inaccurate. And while lower bound comes from the segments the upper bound comes from recovery terms.

...hm, now that I'm writing this maybe the problem is with the special case that a v1_to_v2 conversion happens before maybe_deltas_to_betas, which drops all the transient messages and produces no v2 segment files. At the same time recovery terms remembers the state before shutdown when there were segment files (v1) and a non-zero low and next seq id.

So maybe the maybe_deltas_to_betas loop could be skipped or optimised if some condition of "queue emptiness" applies (eg persistent_count = 0 in recovery terms, or segments is empty in queue index). Because then there is no message to move from disk to memory.

lhoguin Mar 31, 2025
Maintainer

Conversion does not necessarily drop transient messages, it doesn't know about these things. Conversion can happen at runtime not only at startup. Conversion drops messages from the index when they can't be found, for example if restarting a node and they were in a transient shared store which emptied itself during startup.

When the queue starts it must read from the queue to get at least 1 message in memory, because we need to do TTL. That's why it does maybe_deltas_to_betas. On startup the queue computes an ID (transient_threshold) that represents transient messages before startup, and it uses that to see which messages on disk are to be removed. If there were only transient messages in the queue then it could take a long time to go through them. If there was a persistent message here and there then it would stop at the first persistent message and drop the subsequent transient messages later on.

The DeltaCount stuff I don't remember enough (that part of the code is older than my contributions) but the blank one represents a lack of messages on disk, and is used to shortcut a few things via ?BLANK_DELTA_PATTERN.

LowSeqId of 0 is fine, it's not the problem. It'll become whichever SeqId is the lowest immediately even if that real ID is in the billions, as long as it finds a valid message. In any case the index knows which messages are the real lowest so it doesn't start at 0.

So my comment above still stands, we have to drop the older transient messages at startup. So we need to process them faster while stopping at only 1 message in memory to avoid using too much.

Use cases that use almost only transient messages could be optimised by having the index file have a flag that indicates there's only transient messages in that file, I suppose, but that doesn't solve everything and the faster read needs to be done in any case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Questions] Classic queue conversion takes very long time after upgrade to 4.0 #12848

{{title}}

Replies: 2 comments 5 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

[Questions] Classic queue conversion takes very long time after upgrade to 4.0 #12848

gomoripeti Nov 28, 2024

Community Support Policy

RabbitMQ version used

Erlang version used

Operating system (distribution) used

How is RabbitMQ deployed?

rabbitmq-diagnostics status output

Logs from node 1 (with sensitive values edited out)

Logs from node 2 (if applicable, with sensitive values edited out)

Logs from node 3 (if applicable, with sensitive values edited out)

rabbitmq.conf

Steps to deploy RabbitMQ cluster

Steps to reproduce the behavior in question

advanced.config

Application code

Kubernetes deployment file

What problem are you trying to solve?

Replies: 2 comments · 5 replies

mkuratczyk Nov 29, 2024 Maintainer

gomoripeti Nov 29, 2024 Author

gomoripeti Mar 27, 2025 Author

michaelklishin Mar 27, 2025 Maintainer

lhoguin Mar 28, 2025 Maintainer

gomoripeti Mar 28, 2025 Author

lhoguin Mar 31, 2025 Maintainer

gomoripeti
Nov 28, 2024

Replies: 2 comments 5 replies

mkuratczyk
Nov 29, 2024
Maintainer

gomoripeti Nov 29, 2024
Author

gomoripeti
Mar 27, 2025
Author

michaelklishin Mar 27, 2025
Maintainer

lhoguin Mar 28, 2025
Maintainer

gomoripeti Mar 28, 2025
Author

lhoguin Mar 31, 2025
Maintainer