Skip to content

Commit 1eda917

Browse files
authored
Merge pull request #175 from PNNL-CompBio/docker-build-multi
Parallelization of build_all.py
2 parents 66e929c + bc28a7a commit 1eda917

36 files changed

+1279
-174
lines changed

.github/workflows/main.yml

+1-3
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,7 @@ name: CI
33
on:
44
push:
55
branches:
6-
- builder_branch_JJ
7-
- docs_update_4_5_24
8-
- doc_update_4_23_24
6+
- docker-build-multi
97
# Allows you to run this workflow manually from the Actions tab
108
workflow_dispatch:
119

build/README.md

+29
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,35 @@
22

33
All data collected for this package has been collated from stable/reproducible sources using the scripts contained here. The figure below shows a brief description of the process, which is designed to be run serially, as new identifiers are generated as data are added.
44

5+
## build_all.py script
6+
7+
This script initializes all docker containers, builds all datasets, validates them, and uploads them to figshare and pypi.
8+
9+
It requires the following authorization tokens to be set in the local environment depending on the use case:
10+
`SYNAPSE_AUTH_TOKEN`: Required for beataml and mpnst datasets. Join the [CoderData team](https://www.synapse.org/#!Team:3503472) on Synapse and generate an access token.
11+
`PYPI_TOKEN`: This token is required to upload to PyPI.
12+
`FIGSHARE_TOKEN`: This token is required to upload to Figshare.
13+
14+
Available arguments:
15+
16+
- `--docker`: Initializes and builds all docker containers.
17+
- `--samples`: Processes and builds the sample data files.
18+
- `--omics`: Processes and builds the omics data files.
19+
- `--drugs`: Processes and builds the drug data files.
20+
- `--exp`: Processes and builds the experiment data files.
21+
- `--all`: Executes all available processes above (docker, samples, omics, drugs, exp).
22+
- `--validate`: Validates the generated datasets using the schema check scripts.
23+
- `--figshare`: Uploads the datasets to Figshare.
24+
- `--pypi`: Uploads the package to PyPI.
25+
- `--high_mem`: Utilizes high memory mode for concurrent data processing.
26+
- `--dataset`: Specifies the datasets to process (default='broad_sanger,hcmi,beataml,mpnst,cptac').
27+
- `--version`: Specifies the version number for the package and data upload title. This is required to upload to figshare and PyPI
28+
29+
Example usage:
30+
```bash
31+
python build/build_all.py --all --high_mem --validate --pypi --figshare --version 0.1.29
32+
```
33+
534
### Directory structure
635

736
We have created a separate directory with scripts that collect data from distinct sources as described below.

build/beatAML/GetBeatAML.py

+4-1
Original file line numberDiff line numberDiff line change
@@ -259,6 +259,9 @@ def merge_drug_info(d_df,drug_map):
259259
#print(drug_map)
260260
#print(d_df.columns)
261261
#print(d_df)
262+
print(d_df['isoSMILES'].dtype, drug_map['isoSMILES'].dtype)
263+
d_df['isoSMILES'] = d_df['isoSMILES'].astype(str)
264+
drug_map['isoSMILES'] = drug_map['isoSMILES'].astype(str)
262265
result_df = d_df.merge(drug_map[['isoSMILES', 'improve_drug_id']], on='isoSMILES', how='left')
263266
return result_df
264267

@@ -607,7 +610,7 @@ def generate_drug_list(drug_map_path,drug_path):
607610
if args.samples:
608611
if args.prevSamples is None or args.prevSamples=='':
609612
print("Cannot run sample file generation without previous samples")
610-
edit()
613+
exit()
611614
else:
612615
print("Only running Samples File Generation")
613616
prev_samples_path = args.prevSamples

0 commit comments

Comments
 (0)