|
2 | 2 |
|
3 | 3 | All data collected for this package has been collated from stable/reproducible sources using the scripts contained here. The figure below shows a brief description of the process, which is designed to be run serially, as new identifiers are generated as data are added.
|
4 | 4 |
|
| 5 | +## build_all.py script |
| 6 | + |
| 7 | +This script initializes all docker containers, builds all datasets, validates them, and uploads them to figshare and pypi. |
| 8 | + |
| 9 | +It requires the following authorization tokens to be set in the local environment depending on the use case: |
| 10 | +`SYNAPSE_AUTH_TOKEN`: Required for beataml and mpnst datasets. Join the [CoderData team](https://www.synapse.org/#!Team:3503472) on Synapse and generate an access token. |
| 11 | +`PYPI_TOKEN`: This token is required to upload to PyPI. |
| 12 | +`FIGSHARE_TOKEN`: This token is required to upload to Figshare. |
| 13 | + |
| 14 | +Available arguments: |
| 15 | + |
| 16 | +- `--docker`: Initializes and builds all docker containers. |
| 17 | +- `--samples`: Processes and builds the sample data files. |
| 18 | +- `--omics`: Processes and builds the omics data files. |
| 19 | +- `--drugs`: Processes and builds the drug data files. |
| 20 | +- `--exp`: Processes and builds the experiment data files. |
| 21 | +- `--all`: Executes all available processes above (docker, samples, omics, drugs, exp). |
| 22 | +- `--validate`: Validates the generated datasets using the schema check scripts. |
| 23 | +- `--figshare`: Uploads the datasets to Figshare. |
| 24 | +- `--pypi`: Uploads the package to PyPI. |
| 25 | +- `--high_mem`: Utilizes high memory mode for concurrent data processing. |
| 26 | +- `--dataset`: Specifies the datasets to process (default='broad_sanger,hcmi,beataml,mpnst,cptac'). |
| 27 | +- `--version`: Specifies the version number for the package and data upload title. This is required to upload to figshare and PyPI |
| 28 | + |
| 29 | +Example usage: |
| 30 | +```bash |
| 31 | +python build/build_all.py --all --high_mem --validate --pypi --figshare --version 0.1.29 |
| 32 | +``` |
| 33 | + |
5 | 34 | ### Directory structure
|
6 | 35 |
|
7 | 36 | We have created a separate directory with scripts that collect data from distinct sources as described below.
|
|
0 commit comments