PNNL-CompBio
diff --git a/‎.github/workflows/main.yml
Lines changed: 1 addition & 3 deletions b/‎.github/workflows/main.yml
Lines changed: 1 addition & 3 deletions
diff --git a/‎build/README.md
Lines changed: 29 additions & 0 deletions b/‎build/README.md
Lines changed: 29 additions & 0 deletions
diff --git a/‎build/beatAML/GetBeatAML.py
Lines changed: 4 additions & 1 deletion b/‎build/beatAML/GetBeatAML.py
Lines changed: 4 additions & 1 deletion
@@ -3,9 +3,7 @@ name: CI
 on:
   push:
     branches:
-      - builder_branch_JJ
-      - docs_update_4_5_24
-      - doc_update_4_23_24
+      - docker-build-multi
   # Allows you to run this workflow manually from the Actions tab
   workflow_dispatch:
 
 
@@ -2,6 +2,35 @@
 
 All data collected for this package has been collated from stable/reproducible sources using the scripts contained here. The figure below shows a brief description of the process, which is designed to be run serially, as new identifiers are generated as data are added.
 
+## build_all.py script
+
+This script initializes all docker containers, builds all datasets, validates them, and uploads them to figshare and pypi.
+
+It requires the following authorization tokens to be set in the local environment depending on the use case:  
+`SYNAPSE_AUTH_TOKEN`: Required for beataml and mpnst datasets. Join the [CoderData team](https://www.synapse.org/#!Team:3503472) on Synapse and generate an access token.
+`PYPI_TOKEN`: This token is required to upload to PyPI.
+`FIGSHARE_TOKEN`: This token is required to upload to Figshare.
+
+Available arguments:
+
+- `--docker`: Initializes and builds all docker containers.
+- `--samples`: Processes and builds the sample data files.
+- `--omics`: Processes and builds the omics data files.
+- `--drugs`: Processes and builds the drug data files.
+- `--exp`: Processes and builds the experiment data files.
+- `--all`: Executes all available processes above (docker, samples, omics, drugs, exp).
+- `--validate`: Validates the generated datasets using the schema check scripts.
+- `--figshare`: Uploads the datasets to Figshare.
+- `--pypi`: Uploads the package to PyPI.
+- `--high_mem`: Utilizes high memory mode for concurrent data processing.
+- `--dataset`: Specifies the datasets to process (default='broad_sanger,hcmi,beataml,mpnst,cptac').
+- `--version`: Specifies the version number for the package and data upload title. This is required to upload to figshare and PyPI
+
+Example usage:
+```bash
+python build/build_all.py --all --high_mem --validate --pypi --figshare --version 0.1.29
+```
+
 ### Directory structure
 
 We have created a separate directory with scripts that collect data from distinct sources as described below.
 
@@ -259,6 +259,9 @@ def merge_drug_info(d_df,drug_map):
     #print(drug_map)
     #print(d_df.columns)
     #print(d_df)
+    print(d_df['isoSMILES'].dtype, drug_map['isoSMILES'].dtype)
+    d_df['isoSMILES'] = d_df['isoSMILES'].astype(str)
+    drug_map['isoSMILES'] = drug_map['isoSMILES'].astype(str)
     result_df = d_df.merge(drug_map[['isoSMILES', 'improve_drug_id']], on='isoSMILES', how='left')
     return result_df
 
@@ -607,7 +610,7 @@ def generate_drug_list(drug_map_path,drug_path):
     if args.samples:
         if args.prevSamples is None or args.prevSamples=='':
             print("Cannot run sample file generation without previous samples")
-            edit()
+            exit()
         else:
             print("Only running Samples File Generation")
             prev_samples_path = args.prevSamples