Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Obtain Re-Indexing Benchmark from Metacat 2.19 #2043

Open
artntek opened this issue Dec 17, 2024 · 5 comments
Open

Obtain Re-Indexing Benchmark from Metacat 2.19 #2043

artntek opened this issue Dec 17, 2024 · 5 comments
Assignees

Comments

@artntek
Copy link
Contributor

artntek commented Dec 17, 2024

Obtain Re-Indexing Benchmark from Metacat 2.19 to Compare With 3.1 for ADC NSF Site Visit

Already have benchmarks for reindexing 3.1.0 on k8s:

  • test.adc on dev cluster (see: Move test.arcticdata.io to Kubernetes #1932)
    • ~2.7 objects/second with 25 index workers
    • (64 hours 22 minutes for approx. [622000 objects])
    • (13.5x speed increase over PISCO numbers below)(https://test.arcticdata.io/metacat/d1/mn/v2/object)
    • Processors:
      tomcat@metacatarctic-0:/usr/local/tomcat/bin$ nproc --all
      32
      
      d1indexer@metacatarctic-d1index-68dcf4cb5f-8djk8:/var/lib/dataone-indexer$ nproc --all
      32
    • Memory:
      tomcat@metacatarctic-0:/usr/local/tomcat/bin$ free -h | grep "Mem" | awk '{print $2}'
      125Gi
      
      d1indexer@metacatarctic-d1index-68dcf4cb5f-8djk8:/var/lib/dataone-indexer$ free -h | grep "Mem" | awk '{print $2}'
      125Gi
      
  • copy of live adc on prod cluster (see: Move arcticdata.io (Production) to Kubernetes #1954)
    • ~3.3 objects/second with 50 index workers
    • (94 hours 22 minutes for approx. 1116383 objects
    • (16.5x speed increase over PISCO numbers below)
    • Processors:
      tomcat@metacatarctic-0:/usr/local/tomcat/bin$ nproc --all
      256
      
      d1indexer@metacatarctic-d1index-84ff5f9dbb-2qb67:/var/lib/dataone-indexer$ nproc --all
      32
    • Memory:
      tomcat@metacatarctic-0:/usr/local/tomcat/bin$ free -h | grep "Mem" | awk '{print $2}'
      472Gi
      
      d1indexer@metacatarctic-d1index-84ff5f9dbb-2qb67:/var/lib/dataone-indexer$ free -h | grep "Mem" | awk '{print $2}'
      125Gi

For comparison PISCO reindex on Metacat 3.0.0, not in k8s (i.e. one index worker):

  • ~0.2 objects/second
  • ("about 12 days" for 187,081 objects -- from ~21K data packages)

Possibilities for 2.19 benchmark:

  1. Could start the legacy deployment of test.adc up again (2.19 deployed but not running on knbvm)
    • total 622308 objects on new test.adc; probably almost as many on the old one
    • Processors:
      brooke@knbvm:~$ nproc --all
      8
    • Memory:
      brooke@knbvm:~$ free -h | grep "Mem" | awk '{print $2}'
      15Gi
  2. Jing suggested re-indexing mn-demo-6.test.dataone.org
    • total 35925 objects
    • Processors:
      brooke@mn-demo-6:~$ nproc --all
      4
    • Memory:
      brooke@mn-demo-6:~$ free -h | grep "Mem" | awk '{print $2}'
      7.8Gi
@artntek artntek self-assigned this Dec 17, 2024
@artntek
Copy link
Contributor Author

artntek commented Dec 18, 2024

Nick created a temporary clone of the mn-demo-6 VM for benchmarking, and set it up with 32 cores and 125GB RAM, to mimic the test.ADC metacat pod on the dev cluster. Its hostname is generous-boar.test.dataone.org

https://generous-boar.test.dataone.org/metacatui

@artntek
Copy link
Contributor Author

artntek commented Dec 18, 2024

Copy of mn-demo-6 vm (metacat 2.19.1):

https://generous-boar.test.dataone.org/knb/d1/mn/v2/object shows 35,932 objects

First run: 12/18/2024

reindex started 2024-12-18 19:14:01 UTC

[2024-12-18 19:14:01] [info] metacat 20241218-19:14:01: [INFO]: MetaCatServlet.handleGetOrPost -
Action is: reindexall [edu.ucsb.nceas.metacat.MetaCatServlet:handleGetOrPost:825]

finished at 2024-12-19 1:29:14 UTC

Time taken: 6h 15m => 1.6 objects per second


Second run: 12/20/2024

Log level [info] so we can see when it finishes

reindex started 17:56:42 UTC

[2024-12-20 17:56:42] [info] metacat 20241220-17:56:42: [INFO]: MetaCatServlet.handleGetOrPost -
Action is: reindexall [edu.ucsb.nceas.metacat.MetaCatServlet:handleGetOrPost:825]

Queue Population/Depletion:

  • monitor https://generous-boar.test.dataone.org/knb/d1/mn/v2/monitor/status
    while [ $(curl -ks https://generous-boar.test.dataone.org/knb/d1/mn/v2/monitor/status | grep -c ">0<") -ne 1 ]; do \
      echo $(curl -ks https://generous-boar.test.dataone.org/knb/d1/mn/v2/monitor/status | \
        sed -e 's/.*<sizeOfQueue>//g' | sed -e 's/<\/sizeOfQueue>.*//g'); \
      sleep 5; \
    done; \
    echo "* * * * FINISHED AT: $(date) * * * *"; 
  • queue population/depletion finished at: 01:40 UTC
  • Time taken: 7h 44m
  • Rate: 1.3 obj/sec

Indexing Complete:

  • (grep for last occurrence of Total time to process indexer)
    Indexing completely finished at: 01:42 UTC
  • Time taken: 7h 46m
  • Rate: 1.3 obj/sec

@artntek
Copy link
Contributor Author

artntek commented Dec 19, 2024

dev cluster

Copied same data files and DB to dev cluster/ceph (32 processors, 125Gi), 35932 objects
Reindex all with 50 workers:

First run: 12/19/24

Queue population and consumption:

  • start population: 22:45 UTC (12/19/24)
  • end consumption: 22:57 UTC
  • Time to empty queue: 12 minutes => 42.8 objects per second

Entire Reindex(?) NOTE: assumes last log entry (an ERROR) is end of indexing

  • last log entry: 23:52
  • Time to last log entry: 67 minutes => 8.9 objects per second

Second run: 12/20/24

Log level [info] so we can see when it finishes

reindex started 19:33 UTC

192.168.192.169 - - [20/Dec/2024:19:32:45 +0000]
    "PUT /metacat/d1/mn/v2/index?all=true HTTP/1.1" 200 73

Queue Population/Depletion:

monitor rabbitmq admin via kubectl port-forward

Screenshot 2024-12-20 at 11 59 00 AM

  • queue population/depletion finished at: ~19:46 UTC
  • Time taken: 13m
  • Rate: 46 obj/sec

Indexing Complete:

(grep for last occurrence of Completed the index task from the index queue)

  • Indexing completely finished at: 21:48
  • Time taken: 2h 15m
  • Rate: 4.4 obj/sec

@artntek
Copy link
Contributor Author

artntek commented Dec 19, 2024

Benchmark Summary

[see note 1]

FIRST BENCHMARK:

Metacat 2.19.1 (vm), 1 index worker (12/18/24):

  • Time to populate and then empty the queue:
    • 6h 15m => 1.6 objects per second
  • Time for entire reindex (time to last log entry):
    • unknown - assumed same as queue empty

Metacat 3.1.0 (k8s), 50 index workers (12/19/24):

  • Time to populate and then empty the queue:
    • 12 minutes => 42.8 objects per second
    • 26.75x faster
  • Time for entire reindex [note 2]):
    • 67 minutes => 8.9 objects per second
    • (Worst-case scenario: 5.6 time faster, assuming mc2.19 indexing completes as soon as queue is empty)

NOTES:

  1. Total 35,932 objects. Each Metacat host had 32 processors & 125Gi RAM, with similar speed/specs - although the machine hosting the vm was slightly newer than k8s machine
  2. First benchmark above assumes last log entry (an ERROR) marks the end of indexing. This is possibly not true.

SECOND BENCHMARK:

Metacat 2.19.1 (vm), 1 index worker (12/20/24):

  • Time to populate and then empty the queue:
    • 7h 44m
    • 1.3 obj/sec
  • Time for entire reindex (time to last log entry: Total time to process indexer):
    • 7h 46m
    • 1.3 obj/sec

Metacat 3.1.0 (k8s), 50 index workers (12/20/24):

  • Time to populate and then empty the queue:
    • 13m
    • Rate: 46 obj/sec
  • Time for entire reindex (time to last log entry: Completed the index task from the index queue):
    • 2h 15m
    • Rate: 4.4 obj/sec
    • 3.5x faster

@artntek
Copy link
Contributor Author

artntek commented Jan 6, 2025

@jeanetteclark - when you get chance, please lmk if you have all the info you need, and whether I can close this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant