-
Notifications
You must be signed in to change notification settings - Fork 1
Index List
Index Workflow Wiki: https://github.com/VertNet/dwc-indexer/wiki/Index-Workflow
Up to date information about a given index can be found with
http://indexer.vertnet-portal.appspot.com/list-indexes?namespace=[index namespace]
For example:
http://indexer.vertnet-portal.appspot.com/list-indexes?namespace=index-2014-02-11a
- http://indexer.vertnet-portal.appspot.com/list-indexes?namespace=index-2014-02-11a
- https://console.developers.google.com/project/vertnet-portal/appengine/search/index/dwc?namespace=index-2014-02-11a
- Name: dwc
- Text search page: http://goo.gl/99mhgL
- Date: 2014-03-26 11:26
- Storage usage: 107118
- Storage limit: 268435456000
- Original limit: 10737418240
- Usage: 0% (17 records)
- Status: responsive
- 2014-10-16
- 11:20 107118L (17 records)
- 11:30 107118L (0 records)
- 2014-12-22 Search shows no records in index.
- 2015-05-22 Search shows no records in index. "No documents meet these criteria."
- 2015-05-28 Loaded dwc2015 index schema with http://indexer.vertnet-portal.appspot.com/index-gcs-path?namespace=index-2014-02-06t2&index_name=dwc&gcs_path=vertnet-harvesting/data/2015-05-22/mvz_hild-1627c464-1106-4d3c-bf3e-033b3f9d0fcc/*&shard_count=10. Index shows record. Commencing indexing for dwc2015 schema using this namespace.
Comments: Was originally a 10G index. Quota increased by Google. Was index-cleaned, then records loaded for testing and found responsive.
Schema: {u'family': ['TEXT'], u'stateprovince': ['ATOM', 'TEXT'], u'hastypestatus': ['NUMBER'], u'rank': ['NUMBER'], u'county': ['TEXT'], u'tissue': ['NUMBER'], u'year': ['TEXT'], u'specificepithet': ['TEXT'], u'media': ['NUMBER'], u'institutioncode': ['TEXT'], u'class': ['TEXT'], u'location': ['GEO_POINT'], u'collectorname': ['TEXT'], u'type': ['TEXT'], u'recordedby': ['TEXT'], u'verbatim_record': ['TEXT'], u'catalognumber': ['TEXT'], u'url': ['TEXT'], u'country': ['ATOM', 'TEXT'], u'mappable': ['NUMBER'], u'record': ['TEXT'], u'genus': ['TEXT'], u'eventdate': ['DATE']}
Namespace: index-2014-03-12 (http://portal.vertnet.org/ up to 2014-12-22, now obsolete)
- http://indexer.vertnet-portal.appspot.com/list-indexes?namespace=index-2014-03-12
- https://console.developers.google.com/project/vertnet-portal/appengine/search/index/dwc?namespace=index-2014-03-12
- Name: dwc
- Text search page: http://goo.gl/w0MOK7
- Date: 2014-03-26 11:26
- Storage usage: 11639596921
- Storage limit: 268435456000
- Original limit: 10737418240
- Usage:
- Status: responsive
- 2014-03-26
- 11:00 11639596921L
- 11:26 12761579749L
- 11:48 13646012508L
- 12:59 15612212751L
- 14:20 19762280254L (document put rate quota errors)
- 15:26 21527682718L (still increasing, but indexing has failed)
- 13:56 45579236910L (nearing the end of loading index with 12M records from 2014-03-12, 2014-03-13, and 2014-03-27 harvests)
- 2014-10-17 11:29 62457510392L
- 2014-12-22 67333252006L (14,569,231 records)
- 2015-05-25 81353850720L (17,723,735 records)
Comments: First attempt to load resulted in quota overrun at 100% capacity of the 10G originally granted. Quota increased to 250G, but loading still had quota overrun for a couple of days. Once records could be loaded again without quota overrun, cleaned the 3038934 records. Redesigned index, then started loading again 25 Mar 2014 with largest data sets first. Loaded somewhere in the neighborhood of 3M records before emitting quota errors again, but these where errors based on the document inserts per minute, not the storage_quota for the index. Continued to load the index more conservatively, with no more than a couple of indexer jobs running simultaneously.
- Schema: {u'family': ['TEXT'], u'stateprovince': ['TEXT'], u'hastypestatus': ['NUMBER'], u'rank': ['NUMBER'], u'county': ['TEXT'], u'occurrenceid': ['TEXT'], u'tissue': ['NUMBER'], u'year': ['TEXT', 'NUMBER'], u'specificepithet': ['TEXT'], u'continent': ['TEXT'], u'resource': ['TEXT'], u'hashid': ['NUMBER'], u'pubdate': ['TEXT'], u'media': ['NUMBER'], u'institutioncode': ['TEXT'], u'class': ['TEXT'], u'location': ['GEO_POINT'], u'fossil': ['NUMBER'], u'type': ['TEXT'], u'islandgroup': ['TEXT'], u'recordedby': ['TEXT'], u'verbatim_record': ['TEXT'], u'catalognumber': ['TEXT'], u'url': ['TEXT'], u'country': ['TEXT'], u'mappable': ['NUMBER'], u'order': ['TEXT'], u'record': ['TEXT'], u'island': ['TEXT'], u'genus': ['TEXT'], u'coordinateuncertaintyinmeters': ['NUMBER'], u'eventdate': ['DATE']}
- http://indexer.vertnet-portal.appspot.com/list-indexes?namespace=index-2014-02-11
- https://console.developers.google.com/project/vertnet-portal/appengine/search/index/dwc?namespace=index-2014-02-11
- Name: dwc
- Date: 2014-03-26 11:26
- Storage usage: 32023049013
- Storage limit: 268435456000
- Original limit: 268435456000
- Status: responsive
- 2014-03-26 15:39:02.603, started index-clean to clean out the index for re-use since index-2014-03-12 failed with quota errors at around 3M records again.
- 2014-03-26 22:44:40.281 Finished index-clean on index index-2014-02-11.dwc. Removed 6254379 documents.
- 2015-05-22. Search shows no records in index. "No documents meet these criteria."
- http://indexer.vertnet-portal.appspot.com/list-indexes?namespace=index-2014-02-06
- https://console.developers.google.com/project/vertnet-portal/appengine/search/index/dwc?namespace=index-2014-02-06
- Name: dwc
- Date: 2014-03-26 11:26
- Storage usage: 7791101 (emptied of documents)
- Storage limit: 268435456000
- Original limit: 268435456000
- Usage: 0%
- Status: not responsive
- 2015-05-22 Index found to have documents. Removed 1948 records from ccber. No records found after. "No documents meet these criteria."
Comments: Was 5.3% full with 14324192556L usage. index-cleaned but not responsive. Here is the final output from the cleaning run: 2014-03-21 08:07:16.902 /index-clean 200 736ms 27kb AppEngine-Google; (+http://code.google.com/appengine) module=default version=indexer 0.1.0.2 - - [21/Mar/2014:04:07:16 -0700] "POST /index-clean HTTP/1.1" 200 28408 "http://indexer.vertnet-portal.appspot.com/index-clean" "AppEngine-Google; (+http://code.google.com/appengine)" "indexer.vertnet-portal.appspot.com" ms=736 cpu_ms=86 cpm_usd=4.013175 queue_name=index-clean task_name=2572243230419474568 pending_ms=20 app_engine_release=1.9.1 instance=00c61b117cb0ac3e476edb20e488397bef46c4 I 2014-03-21 08:07:16.898 Queuing index-clean task with params {'ndeleted': 8335200, 'max_delete': u'', 'namespace': u'index-2014-02-06', 'index_name': u'dwc', 'id': u'university-of-texas-at-arlington-amphibian-and-reptile-diversity-research-center/uta-herpetology/ffefa851-4c5f-4322-a8ce-6eaa23bd7e04', 'batch_size': u''} 2014-03-21 08:07:18.031 /index-clean 200 1083ms 4kb AppEngine-Google; (+http://code.google.com/appengine) module=default version=indexer 0.1.0.2 - - [21/Mar/2014:04:07:18 -0700] "POST /index-clean HTTP/1.1" 200 4155 "http://indexer.vertnet-portal.appspot.com/index-clean" "AppEngine-Google; (+http://code.google.com/appengine)" "indexer.vertnet-portal.appspot.com" ms=1084 cpu_ms=21 cpm_usd=0.560464 queue_name=index-clean task_name=10355228373950732507 app_engine_release=1.9.1 instance=00c61b117cb0ac3e476edb20e488397bef46c4 I 2014-03-21 08:07:18.030 Finished index-clean on index index-2014-02-06.dwc. Removed 8335228 documents.
- http://indexer.vertnet-portal.appspot.com/list-indexes?namespace=index-2014-01-10
- https://console.developers.google.com/project/vertnet-portal/appengine/search/index/dwc?namespace=index-2014-01-10
- Name: dwc
- Date: 2014-03-26 11:26
- Storage usage: 26139374438
- Storage limit: 268435456000
- Original limit: 268435456000
- Usage: 9.7%
- Status: not responsive
- 2015-05-22 Found to have documents. Ran index-clean. Removed 17204527 documents. Search after does not complete.
- 2015-05-28 Search showed: "No documents meet these criteria." Loaded dwc2015 index schema with http://indexer.vertnet-portal.appspot.com/index-gcs-path?namespace=index000001&index_name=dwc&gcs_path=vertnet-harvesting/data/2015-05-22/harvesttest-9fbe6712-cf12-4c0f-9a73-f60967ebb485/*&shard_count=10 and the index loaded with the one record and is apparently functional.
- http://indexer.vertnet-portal.appspot.com/list-indexes?namespace=index-2013-08-08
- https://console.developers.google.com/project/vertnet-portal/appengine/search/index/dwc?namespace=index-2013-08-08
- Name: dwc
- Date: 2014-03-26 11:26
- Storage usage: 32455111726 (~8.5M documents)
- Storage limit: 268435456000
- Original limit: 268435456000
- Usage: 12.1%
- Status: responsive
- 2015-05-22 Found to have documents. Ran index-clean. Removed 8196723 documents. Search after does not complete.
Namespace: index-2014-02-05a (http://amazoniabiodiversity.vertnet-portal.appspot.com/ as of 2014-12-22)
- http://indexer.vertnet-portal.appspot.com/list-indexes?namespace=index-2014-02-05a
- https://console.developers.google.com/project/vertnet-portal/appengine/search/index/dwc?namespace=index-2014-02-05a
- Name: dwc
- Date: 2014-03-26 11:26
- Storage usage: 25560245
- Storage limit: 10737418240
- Original limit: 10737418240
- Usage: 0.2%
- Status: responsive
- 2015-05-22. In use for Amazonian Biodiversity Portal. Removed resources up through NYBG in error. Needs repopulating.
- http://indexer.vertnet-portal.appspot.com/list-indexes?namespace=index-2014-02-06t
- https://console.developers.google.com/project/vertnet-portal/appengine/search/index/dwc?namespace=index-2014-02-06t
- Name: dwc
- Date: 2014-03-26 11:26
- Storage usage: 400726362 (emptied of documents)
- Storage limit: 10737418240
- Original limit: 10737418240
- Usage: 0%
- Status: responsive
- 2015-05-22 Has documents, including NYBG. Ran index-clean. Removed 148565 documents. Search after shows "No documents meet these criteria."
- http://indexer.vertnet-portal.appspot.com/list-indexes?namespace=index-2014-02-06t2
- https://console.developers.google.com/project/vertnet-portal/appengine/search/index/dwc?namespace=index-2014-02-06t2
- Name: dwc
- Date: 2014-03-26 11:26
- Storage usage: 22070956 (emptied of documents)
- Storage limit: 10737418240
- Original limit: 10737418240
- Usage: 0%
- Status: responsive
- 2015-05-22 Has documents. Ran index-clean. Removed 10200 documents. Search after shows "No documents meet these criteria."
- 2015-05-28 Loaded dwc2015 index schema with http://indexer.vertnet-portal.appspot.com/index-gcs-path?namespace=index-2014-02-06t2&index_name=dwc&gcs_path=vertnet-harvesting/data/2015-05-22/mvz_hild-1627c464-1106-4d3c-bf3e-033b3f9d0fcc/*&shard_count=10. Tested queries on the index and in the test portal. All functional.
- http://indexer.vertnet-portal.appspot.com/list-indexes?namespace=index000001
- https://console.developers.google.com/project/vertnet-portal/appengine/search/index/dwc?namespace=index000001
- Name: dwc
- Date: 2014-03-26 11:26
- Storage usage: 32058766453
- Storage limit: 268435456000
- Original limit: 268435456000
- Usage: 11.9%
- Status: responsive
- 2015-05-22 Has documents. Ran index-clean. Removed 8166827 documents. Search after does not complete.
- 2015-05-28 Search does not complete. Loaded dwc2015 index schema with http://indexer.vertnet-portal.appspot.com/index-gcs-path?namespace=index000001&index_name=dwc&gcs_path=vertnet-harvesting/data/2015-05-22/harvesttest-9fbe6712-cf12-4c0f-9a73-f60967ebb485/*&shard_count=10 and the index loaded with the one record and is apparently functional.
- Name: dwc_search
- Date: 2014-03-26 11:26
- Storage usage: 684574297
- Storage limit: 10737418240
- Original limit: 10737418240
- Usage: 6.4%
- Status: responsive