Skip to content
John Wieczorek edited this page Oct 10, 2015 · 35 revisions

Index Workflow Wiki: https://github.com/VertNet/dwc-indexer/wiki/Index-Workflow

Up to date information about a given index can be found with

http://indexer.vertnet-portal.appspot.com/list-indexes?namespace=[index namespace]

For example:

http://indexer.vertnet-portal.appspot.com/list-indexes?namespace=index-2014-02-11a

Namespace: index-2014-02-11a (New index for VN portal since 2014-12-22)

Comments: Was originally a 10G index. Quota increased by Google. Was index-cleaned, then records loaded for testing and found responsive.

Schema: {u'family': ['TEXT'], u'stateprovince': ['ATOM', 'TEXT'], u'hastypestatus': ['NUMBER'], u'rank': ['NUMBER'], u'county': ['TEXT'], u'tissue': ['NUMBER'], u'year': ['TEXT'], u'specificepithet': ['TEXT'], u'media': ['NUMBER'], u'institutioncode': ['TEXT'], u'class': ['TEXT'], u'location': ['GEO_POINT'], u'collectorname': ['TEXT'], u'type': ['TEXT'], u'recordedby': ['TEXT'], u'verbatim_record': ['TEXT'], u'catalognumber': ['TEXT'], u'url': ['TEXT'], u'country': ['ATOM', 'TEXT'], u'mappable': ['NUMBER'], u'record': ['TEXT'], u'genus': ['TEXT'], u'eventdate': ['DATE']}

Namespace: index-2014-03-12 (http://portal.vertnet.org/ up to 2014-12-22, now obsolete)

Comments: First attempt to load resulted in quota overrun at 100% capacity of the 10G originally granted. Quota increased to 250G, but loading still had quota overrun for a couple of days. Once records could be loaded again without quota overrun, cleaned the 3038934 records. Redesigned index, then started loading again 25 Mar 2014 with largest data sets first. Loaded somewhere in the neighborhood of 3M records before emitting quota errors again, but these where errors based on the document inserts per minute, not the storage_quota for the index. Continued to load the index more conservatively, with no more than a couple of indexer jobs running simultaneously.

  • Schema: {u'family': ['TEXT'], u'stateprovince': ['TEXT'], u'hastypestatus': ['NUMBER'], u'rank': ['NUMBER'], u'county': ['TEXT'], u'occurrenceid': ['TEXT'], u'tissue': ['NUMBER'], u'year': ['TEXT', 'NUMBER'], u'specificepithet': ['TEXT'], u'continent': ['TEXT'], u'resource': ['TEXT'], u'hashid': ['NUMBER'], u'pubdate': ['TEXT'], u'media': ['NUMBER'], u'institutioncode': ['TEXT'], u'class': ['TEXT'], u'location': ['GEO_POINT'], u'fossil': ['NUMBER'], u'type': ['TEXT'], u'islandgroup': ['TEXT'], u'recordedby': ['TEXT'], u'verbatim_record': ['TEXT'], u'catalognumber': ['TEXT'], u'url': ['TEXT'], u'country': ['TEXT'], u'mappable': ['NUMBER'], u'order': ['TEXT'], u'record': ['TEXT'], u'island': ['TEXT'], u'genus': ['TEXT'], u'coordinateuncertaintyinmeters': ['NUMBER'], u'eventdate': ['DATE']}

Namespace: index-2014-02-11

Namespace: index-2014-02-06

Comments: Was 5.3% full with 14324192556L usage. index-cleaned but not responsive. Here is the final output from the cleaning run: 2014-03-21 08:07:16.902 /index-clean 200 736ms 27kb AppEngine-Google; (+http://code.google.com/appengine) module=default version=indexer 0.1.0.2 - - [21/Mar/2014:04:07:16 -0700] "POST /index-clean HTTP/1.1" 200 28408 "http://indexer.vertnet-portal.appspot.com/index-clean" "AppEngine-Google; (+http://code.google.com/appengine)" "indexer.vertnet-portal.appspot.com" ms=736 cpu_ms=86 cpm_usd=4.013175 queue_name=index-clean task_name=2572243230419474568 pending_ms=20 app_engine_release=1.9.1 instance=00c61b117cb0ac3e476edb20e488397bef46c4 I 2014-03-21 08:07:16.898 Queuing index-clean task with params {'ndeleted': 8335200, 'max_delete': u'', 'namespace': u'index-2014-02-06', 'index_name': u'dwc', 'id': u'university-of-texas-at-arlington-amphibian-and-reptile-diversity-research-center/uta-herpetology/ffefa851-4c5f-4322-a8ce-6eaa23bd7e04', 'batch_size': u''} 2014-03-21 08:07:18.031 /index-clean 200 1083ms 4kb AppEngine-Google; (+http://code.google.com/appengine) module=default version=indexer 0.1.0.2 - - [21/Mar/2014:04:07:18 -0700] "POST /index-clean HTTP/1.1" 200 4155 "http://indexer.vertnet-portal.appspot.com/index-clean" "AppEngine-Google; (+http://code.google.com/appengine)" "indexer.vertnet-portal.appspot.com" ms=1084 cpu_ms=21 cpm_usd=0.560464 queue_name=index-clean task_name=10355228373950732507 app_engine_release=1.9.1 instance=00c61b117cb0ac3e476edb20e488397bef46c4 I 2014-03-21 08:07:18.030 Finished index-clean on index index-2014-02-06.dwc. Removed 8335228 documents.

Namespace: index-2014-01-10

Namespace: index-2013-08-08

Namespace: index-2014-02-05a (http://amazoniabiodiversity.vertnet-portal.appspot.com/ as of 2014-12-22)

Namespace: index-2014-02-06t

Namespace: index-2014-02-06t2

Namespace: index000001

Namespace: (None)

  • Name: dwc_search
  • Date: 2014-03-26 11:26
  • Storage usage: 684574297
  • Storage limit: 10737418240
  • Original limit: 10737418240
  • Usage: 6.4%
  • Status: responsive
Clone this wiki locally