Obtain Re-Indexing Benchmark from Metacat 2.19 #2043

artntek · 2024-12-17T21:24:19Z

Obtain Re-Indexing Benchmark from Metacat 2.19 to Compare With 3.1 for ADC NSF Site Visit

Already have benchmarks for reindexing 3.1.0 on k8s:

test.adc on dev cluster (see: Move test.arcticdata.io to Kubernetes #1932)

~2.7 objects/second with 25 index workers
(64 hours 22 minutes for approx. [622000 objects])
(13.5x speed increase over PISCO numbers below)(https://test.arcticdata.io/metacat/d1/mn/v2/object)

Processors:

tomcat@metacatarctic-0:/usr/local/tomcat/bin$ nproc --all
32

d1indexer@metacatarctic-d1index-68dcf4cb5f-8djk8:/var/lib/dataone-indexer$ nproc --all
32

Memory:

tomcat@metacatarctic-0:/usr/local/tomcat/bin$ free -h | grep "Mem" | awk '{print $2}'
125Gi

d1indexer@metacatarctic-d1index-68dcf4cb5f-8djk8:/var/lib/dataone-indexer$ free -h | grep "Mem" | awk '{print $2}'
125Gi

copy of live adc on prod cluster (see: Move arcticdata.io (Production) to Kubernetes #1954)

~3.3 objects/second with 50 index workers
(94 hours 22 minutes for approx. 1116383 objects
(16.5x speed increase over PISCO numbers below)

Processors:

tomcat@metacatarctic-0:/usr/local/tomcat/bin$ nproc --all
256

d1indexer@metacatarctic-d1index-84ff5f9dbb-2qb67:/var/lib/dataone-indexer$ nproc --all
32

Memory:

tomcat@metacatarctic-0:/usr/local/tomcat/bin$ free -h | grep "Mem" | awk '{print $2}'
472Gi

d1indexer@metacatarctic-d1index-84ff5f9dbb-2qb67:/var/lib/dataone-indexer$ free -h | grep "Mem" | awk '{print $2}'
125Gi

For comparison PISCO reindex on Metacat 3.0.0, not in k8s (i.e. one index worker):

~0.2 objects/second
("about 12 days" for 187,081 objects -- from ~21K data packages)

Possibilities for 2.19 benchmark:

Could start the legacy deployment of test.adc up again (2.19 deployed but not running on knbvm)
- total 622308 objects on new test.adc; probably almost as many on the old one
- Processors:
```
brooke@knbvm:~$ nproc --all
8
```
- Memory:
```
brooke@knbvm:~$ free -h | grep "Mem" | awk '{print $2}'
15Gi
```

Jing suggested re-indexing mn-demo-6.test.dataone.org

total 35925 objects
Processors:
```
brooke@mn-demo-6:~$ nproc --all
4
```

Memory:

brooke@mn-demo-6:~$ free -h | grep "Mem" | awk '{print $2}'
7.8Gi

The text was updated successfully, but these errors were encountered:

artntek · 2024-12-18T01:06:57Z

Nick created a temporary clone of the mn-demo-6 VM for benchmarking, and set it up with 32 cores and 125GB RAM, to mimic the test.ADC metacat pod on the dev cluster. Its hostname is generous-boar.test.dataone.org

https://generous-boar.test.dataone.org/metacatui

artntek · 2024-12-18T19:15:43Z

Copy of `mn-demo-6` vm (metacat 2.19.1):

https://generous-boar.test.dataone.org/knb/d1/mn/v2/object shows 35,932 objects

First run: 12/18/2024

reindex started 2024-12-18 19:14:01 UTC

[2024-12-18 19:14:01] [info] metacat 20241218-19:14:01: [INFO]: MetaCatServlet.handleGetOrPost -
Action is: reindexall [edu.ucsb.nceas.metacat.MetaCatServlet:handleGetOrPost:825]

finished at 2024-12-19 1:29:14 UTC

determined by time when https://generous-boar.test.dataone.org/knb/d1/mn/v2/monitor/status showed zero queue entries

Time taken: 6h 15m => 1.6 objects per second

Second run: 12/20/2024

Log level [info] so we can see when it finishes

reindex started 17:56:42 UTC

[2024-12-20 17:56:42] [info] metacat 20241220-17:56:42: [INFO]: MetaCatServlet.handleGetOrPost -
Action is: reindexall [edu.ucsb.nceas.metacat.MetaCatServlet:handleGetOrPost:825]

Queue Population/Depletion:

monitor https://generous-boar.test.dataone.org/knb/d1/mn/v2/monitor/status

while [ $(curl -ks https://generous-boar.test.dataone.org/knb/d1/mn/v2/monitor/status | grep -c ">0<") -ne 1 ]; do \
  echo $(curl -ks https://generous-boar.test.dataone.org/knb/d1/mn/v2/monitor/status | \
    sed -e 's/.*<sizeOfQueue>//g' | sed -e 's/<\/sizeOfQueue>.*//g'); \
  sleep 5; \
done; \
echo "* * * * FINISHED AT: $(date) * * * *";

queue population/depletion finished at: 01:40 UTC
Time taken: 7h 44m
Rate: 1.3 obj/sec

Indexing Complete:

(grep for last occurrence of Total time to process indexer)
Indexing completely finished at: 01:42 UTC
Time taken: 7h 46m
Rate: 1.3 obj/sec

artntek · 2024-12-19T21:25:55Z

dev cluster

Copied same data files and DB to dev cluster/ceph (32 processors, 125Gi), 35932 objects
Reindex all with 50 workers:

First run: 12/19/24

Queue population and consumption:

start population: 22:45 UTC (12/19/24)
end consumption: 22:57 UTC
Time to empty queue: 12 minutes => 42.8 objects per second

Entire Reindex(?) NOTE: assumes last log entry (an `ERROR`) is end of indexing

last log entry: 23:52
Time to last log entry: 67 minutes => 8.9 objects per second

Second run: 12/20/24

Log level [info] so we can see when it finishes

reindex started 19:33 UTC

192.168.192.169 - - [20/Dec/2024:19:32:45 +0000]
    "PUT /metacat/d1/mn/v2/index?all=true HTTP/1.1" 200 73

Queue Population/Depletion:

monitor rabbitmq admin via kubectl port-forward

queue population/depletion finished at: ~19:46 UTC
Time taken: 13m
Rate: 46 obj/sec

Indexing Complete:

(grep for last occurrence of Completed the index task from the index queue)

Indexing completely finished at: 21:48
Time taken: 2h 15m
Rate: 4.4 obj/sec

artntek · 2024-12-19T23:15:42Z

Benchmark Summary

[see note 1]

FIRST BENCHMARK:

Metacat 2.19.1 (vm), 1 index worker (12/18/24):

Time to populate and then empty the queue:
- 6h 15m => 1.6 objects per second
Time for entire reindex (time to last log entry):
- unknown - assumed same as queue empty

Metacat 3.1.0 (k8s), 50 index workers (12/19/24):

Time to populate and then empty the queue:
- 12 minutes => 42.8 objects per second
- 26.75x faster
Time for entire reindex [note 2]):
- 67 minutes => 8.9 objects per second
- (Worst-case scenario: 5.6 time faster, assuming mc2.19 indexing completes as soon as queue is empty)

NOTES:

Total 35,932 objects. Each Metacat host had 32 processors & 125Gi RAM, with similar speed/specs - although the machine hosting the vm was slightly newer than k8s machine
First benchmark above assumes last log entry (an ERROR) marks the end of indexing. This is possibly not true.

SECOND BENCHMARK:

Metacat 2.19.1 (vm), 1 index worker (12/20/24):

Time to populate and then empty the queue:
- 7h 44m
- 1.3 obj/sec
Time for entire reindex (time to last log entry: Total time to process indexer):
- 7h 46m
- 1.3 obj/sec

Metacat 3.1.0 (k8s), 50 index workers (12/20/24):

Time to populate and then empty the queue:
- 13m
- Rate: 46 obj/sec
Time for entire reindex (time to last log entry: Completed the index task from the index queue):
- 2h 15m
- Rate: 4.4 obj/sec
- 3.5x faster

artntek · 2025-01-06T18:54:20Z

@jeanetteclark - when you get chance, please lmk if you have all the info you need, and whether I can close this issue

artntek self-assigned this Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Obtain Re-Indexing Benchmark from Metacat 2.19 #2043

Obtain Re-Indexing Benchmark from Metacat 2.19 #2043

artntek commented Dec 17, 2024 •

edited

Loading

artntek commented Dec 18, 2024 •

edited

Loading

artntek commented Dec 18, 2024 •

edited

Loading

artntek commented Dec 19, 2024 •

edited

Loading

artntek commented Dec 19, 2024 •

edited

Loading

artntek commented Jan 6, 2025

Obtain Re-Indexing Benchmark from Metacat 2.19 #2043

Obtain Re-Indexing Benchmark from Metacat 2.19 #2043

Comments

artntek commented Dec 17, 2024 • edited Loading

Obtain Re-Indexing Benchmark from Metacat 2.19 to Compare With 3.1 for ADC NSF Site Visit

Already have benchmarks for reindexing 3.1.0 on k8s:

For comparison PISCO reindex on Metacat 3.0.0, not in k8s (i.e. one index worker):

Possibilities for 2.19 benchmark:

artntek commented Dec 18, 2024 • edited Loading

artntek commented Dec 18, 2024 • edited Loading

Copy of mn-demo-6 vm (metacat 2.19.1):

First run: 12/18/2024

Second run: 12/20/2024

Queue Population/Depletion:

Indexing Complete:

artntek commented Dec 19, 2024 • edited Loading

dev cluster

First run: 12/19/24

Queue population and consumption:

Entire Reindex(?) NOTE: assumes last log entry (an ERROR) is end of indexing

Second run: 12/20/24

Queue Population/Depletion:

Indexing Complete:

artntek commented Dec 19, 2024 • edited Loading

Benchmark Summary

FIRST BENCHMARK:

Metacat 2.19.1 (vm), 1 index worker (12/18/24):

Metacat 3.1.0 (k8s), 50 index workers (12/19/24):

NOTES:

SECOND BENCHMARK:

Metacat 2.19.1 (vm), 1 index worker (12/20/24):

Metacat 3.1.0 (k8s), 50 index workers (12/20/24):

artntek commented Jan 6, 2025

artntek commented Dec 17, 2024 •

edited

Loading

artntek commented Dec 18, 2024 •

edited

Loading

artntek commented Dec 18, 2024 •

edited

Loading

Copy of `mn-demo-6` vm (metacat 2.19.1):

artntek commented Dec 19, 2024 •

edited

Loading

Entire Reindex(?) NOTE: assumes last log entry (an `ERROR`) is end of indexing

artntek commented Dec 19, 2024 •

edited

Loading