Skip to content

Commit d014942

Browse files
committed
updated readme with JGI info
1 parent 7cecf46 commit d014942

File tree

1 file changed

+69
-1
lines changed

1 file changed

+69
-1
lines changed

Diff for: README.md

+69-1
Original file line numberDiff line numberDiff line change
@@ -142,4 +142,72 @@ Kegg.version #returns info from http://rest.kegg.jp/info/kegg
142142
|-Kegg.version["original"]["lists"]["enzyme"] = list()
143143
|- Kegg.version["updates"] = list()
144144
|- Kegg.version["current"]
145-
```
145+
```
146+
147+
## Downloading JGI data
148+
149+
Downloading JGI data can be done through by importing the `ecg` package in a script, or through a command line interface (CLI).
150+
151+
### Using import
152+
153+
#### Downloading and running pipeline
154+
155+
```python
156+
from ecg import jgi
157+
import os
158+
159+
chromedriver_path = os.path.expanduser("~")+"/chromedriver" # "~/chromedriver" should also work
160+
path = "myjgi"
161+
162+
J = jgi.Jgi()
163+
J.scrape_domain(path,"Eukarayota")
164+
165+
## Built-in public methods
166+
J.scrape_domain();
167+
J.scrape_urls(organism_urls); # my_organism_urls should be a list of full urls
168+
```
169+
170+
### Using CLI
171+
172+
Example: `python jgi.py --chromedriver_path=/Users/Me/Applications/chromedriver scrape_domain myjgidir Bacteria --database=jgi`
173+
174+
```python
175+
"""
176+
WARNING. CLI HAS NOT BEEN TESTED YET.
177+
178+
Retrieve enzyme data from JGI genomes and metagenomes.
179+
180+
Usage:
181+
jgi.py [--chromedriver_path=<cd_path>|--homepage_url=<hp_url>] scrape_domain PATH DOMAIN [--database=<db>|--assembly_types=<at>...]
182+
jgi.py [--chromedriver_path=<cd_path>|--homepage_url=<hp_url>] scrape_urls PATH DOMAIN ORGANISM_URLS [--assembly_types=<at>...]
183+
184+
Arguments:
185+
PATH Directory where JGI data will be downloaded to
186+
DOMAIN JGI valid domain to scrape data from (one of: 'Eukaryota','Bacteria','Archaea','*Microbiome','Plasmids','Viruses','GFragment','cell','sps','Metatranscriptome')
187+
ORGANISM_URLS (meta)genome URLs to download data from
188+
scrape_domain Download an entire JGI domain and run pipeline to format data
189+
scrape_urls Download data from one or more (meta)genomes by URL
190+
191+
Options:
192+
--chromedriver_path=<cd_path> Path pointing to the chromedriver executable (leaving blank defaults to current dir) [default: None]
193+
--homepage_url=<hp_url> URL of JGI's homepage [default: "https://img.jgi.doe.gov/cgi-bin/m/main.cgi"]
194+
--database=<db> To use only JGI annotated organisms or all organisms [default: "all"]
195+
--assembly_types=<at>... Only used for metagenomic domains. Ignored for others [default: unassembled assembled both]
196+
"""
197+
```
198+
199+
### Output format
200+
201+
The default file structure output from `jgi.Jgi().scrape_domain("myjgidir","Eukarayota")` looks like:
202+
203+
```
204+
myjgidir
205+
|-Eukarayota
206+
| |-combined_taxon_ids
207+
| |-missing_enzymes.json
208+
| |-taxon_ids
209+
| |-2789789765.json
210+
| |-2789789766.json
211+
| ...
212+
```
213+

0 commit comments

Comments
 (0)