Skip to content
tucotuco edited this page Oct 17, 2014 · 35 revisions

Index Workflow Wiki: https://github.com/VertNet/dwc-indexer/wiki/Index-Workflow

Up to date information about a given index can be found with

http://indexer.vertnet-portal.appspot.com/list-indexes?namespace=[index namespace]

For example:

http://indexer.vertnet-portal.appspot.com/list-indexes?namespace=index-2014-03-12

Namespace: index-2014-03-12

  • Name: dwc
  • Text search page: http://goo.gl/w0MOK7
  • Date: 2014-03-26 11:26
  • Storage usage: 11639596921
  • Storage limit: 268435456000
  • Original limit: 10737418240
  • Usage:
  • Status: responsive
  • 11:00 11639596921L (while loading 26 Mar 2014)
  • 11:26 12761579749L
  • 11:48 13646012508L
  • 12:59 15612212751L
  • 14:20 19762280254L (document put rate quota errors)
  • 15:26 21527682718L (still increasing, but indexing has failed)
  • 13:56 45579236910L (nearing the end of loading index with 12M records from 2014-03-12, 2014-03-13, and 2014-03-27 harvests)

Comments: First attempt to load resulted in quota overrun at 100% capacity of the 10G originally granted. Quota increased to 250G, but loading still had quota overrun for a couple of days. Once records could be loaded again without quota overrun, cleaned the 3038934 records. Redesigned index, then started loading again 25 Mar 2014 with largest data sets first. Loaded somewhere in the neighborhood of 3M records before emitting quota errors again, but these where errors based on the document inserts per minute, not the storage_quota for the index. Continued to load the index more conservatively, with no more than a coule of indexer jobs running simultaneously.

Namespace: index-2014-02-11a

  • Name: dwc
  • Text search page: http://goo.gl/99mhgL
  • Date: 2014-03-26 11:26
  • Storage usage: 107118
  • Storage limit: 268435456000
  • Original limit: 10737418240
  • Usage: 0% (17 records)
  • Status: responsive

Comments: Was originally a 10G index. Quota increased by Google. Was index-cleaned, then records loaded for testing and found responsive.

Namespace: index-2014-02-11

  • Name: dwc
  • Date: 2014-03-26 11:26
  • Storage usage: 32023049013
  • Storage limit: 268435456000
  • Original limit: 268435456000
  • Status: responsive

Comments: At 2014-03-26 15:39:02.603, started http://indexer.vertnet-portal.appspot.com/index-clean?index_name=dwc&namespace=index-2014-02-11 to clean out the index for re-use since index-2014-03-12 failed with quota errors at around 3M records again. 2014-03-26 22:44:40.281 Finished index-clean on index index-2014-02-11.dwc. Removed 6254379 documents.

Namespace: index-2014-02-06

  • Name: dwc
  • Date: 2014-03-26 11:26
  • Storage usage: 7791101 (emptied of documents)
  • Storage limit: 268435456000
  • Original limit: 268435456000
  • Usage: 0%
  • Status: not responsive

Comments: Was 5.3% full with 14324192556L usage. index-cleaned but not responsive. Here is the final output from the cleaning run: 2014-03-21 08:07:16.902 /index-clean 200 736ms 27kb AppEngine-Google; (+http://code.google.com/appengine) module=default version=indexer 0.1.0.2 - - [21/Mar/2014:04:07:16 -0700] "POST /index-clean HTTP/1.1" 200 28408 "http://indexer.vertnet-portal.appspot.com/index-clean" "AppEngine-Google; (+http://code.google.com/appengine)" "indexer.vertnet-portal.appspot.com" ms=736 cpu_ms=86 cpm_usd=4.013175 queue_name=index-clean task_name=2572243230419474568 pending_ms=20 app_engine_release=1.9.1 instance=00c61b117cb0ac3e476edb20e488397bef46c4 I 2014-03-21 08:07:16.898 Queuing index-clean task with params {'ndeleted': 8335200, 'max_delete': u'', 'namespace': u'index-2014-02-06', 'index_name': u'dwc', 'id': u'university-of-texas-at-arlington-amphibian-and-reptile-diversity-research-center/uta-herpetology/ffefa851-4c5f-4322-a8ce-6eaa23bd7e04', 'batch_size': u''} 2014-03-21 08:07:18.031 /index-clean 200 1083ms 4kb AppEngine-Google; (+http://code.google.com/appengine) module=default version=indexer 0.1.0.2 - - [21/Mar/2014:04:07:18 -0700] "POST /index-clean HTTP/1.1" 200 4155 "http://indexer.vertnet-portal.appspot.com/index-clean" "AppEngine-Google; (+http://code.google.com/appengine)" "indexer.vertnet-portal.appspot.com" ms=1084 cpu_ms=21 cpm_usd=0.560464 queue_name=index-clean task_name=10355228373950732507 app_engine_release=1.9.1 instance=00c61b117cb0ac3e476edb20e488397bef46c4 I 2014-03-21 08:07:18.030 Finished index-clean on index index-2014-02-06.dwc. Removed 8335228 documents.

Namespace: index-2014-01-10

  • Name: dwc
  • Date: 2014-03-26 11:26
  • Storage usage: 26139374438
  • Storage limit: 268435456000
  • Original limit: 268435456000
  • Usage: 9.7%
  • Status: not responsive

Namespace: index-2013-08-08

  • Name: dwc
  • Date: 2014-03-26 11:26
  • Storage usage: 32455111726 (~8.5M documents)
  • Storage limit: 268435456000
  • Original limit: 268435456000
  • Usage: 12.1%
  • Status: responsive

Namespace: index-2014-02-05a

  • Name: dwc
  • Date: 2014-03-26 11:26
  • Storage usage: 25560245
  • Storage limit: 10737418240
  • Original limit: 10737418240
  • Usage: 0.2%
  • Status: responsive

Namespace: index-2014-02-06t

  • Name: dwc
  • Date: 2014-03-26 11:26
  • Storage usage: 400726362 (emptied of documents)
  • Storage limit: 10737418240
  • Original limit: 10737418240
  • Usage: 0%
  • Status: responsive

Namespace: index-2014-02-06t2

  • Name: dwc
  • Date: 2014-03-26 11:26
  • Storage usage: 22070956 (emptied of documents)
  • Storage limit: 10737418240
  • Original limit: 10737418240
  • Usage: 0%
  • Status: responsive

Namespace: index000001

  • Name: dwc
  • Date: 2014-03-26 11:26
  • Storage usage: 32058766453
  • Storage limit: 268435456000
  • Original limit: 268435456000
  • Usage: 11.9%
  • Status: responsive

Namespace: (None)

  • Name: dwc_search
  • Date: 2014-03-26 11:26
  • Storage usage: 684574297
  • Storage limit: 10737418240
  • Original limit: 10737418240
  • Usage: 6.4%
  • Status: responsive
Clone this wiki locally