Skip to content

Commit e1dcd45

Browse files
authored
Merge branch 'main' into nci60-add
2 parents cb5e19a + 13f5748 commit e1dcd45

File tree

5 files changed

+463
-10
lines changed

5 files changed

+463
-10
lines changed

README.md

Lines changed: 30 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,37 @@ please see the [schema description](schema/README.md).
2626

2727
## Building the data package
2828

29-
The data package is currently assembled via continuous automation,
29+
We have created a build script that executes each step of the build process to enable the creation of a `local` folder with all the requisite folders.
3030

31+
The build requires Python as well as Docker to be installed.
32+
33+
To build the docker images and run them, simply run (though this will take a while!):
34+
```
35+
python build/build_all.py --all
36+
```
37+
38+
To only build the docker files:
39+
```
40+
python build/build_all.py --docker
41+
```
42+
43+
Then to build the reference files (after dockers have been built):
44+
```
45+
python build/build_all.py --samples
46+
python build/build_all.py --drugs
47+
```
48+
49+
Once the sample files have been created, we can collect the omics measurements:
50+
```
51+
python build/build_all.py --omics
52+
```
53+
54+
Once the drugs file and samples have been created, we can refit the curves:
55+
```
56+
python build/build_all.py --exp
57+
```
58+
59+
Note: this will not build the python package, just generate the data!
3160

3261
## Data Source Reference List
3362

build/beatAML/GetBeatAML.py

Lines changed: 1 addition & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -684,22 +684,16 @@ def generate_drug_list(drug_map_path,drug_path):
684684
updated_raw_drug_file = "beatAML_drug_raw.tsv"
685685
generate_raw_drug_file(original_drug_file,sample_mapping_file, updated_raw_drug_file,supplimentary_file)
686686
d_df = pd.read_csv(updated_raw_drug_file,sep='\t')
687-
688687
d_res = d_df.rename(columns={"CELL":"other_id","AUC":"fit_auc",'DRUG':'chem_name'})
689-
690-
# imp_samps = pd.read_csv(improve_map_file)
691688
d_res = d_res.merge(imp_samp_map, on='other_id')
692-
#print(d_res)
693-
#print(imp_drug_map)
694689
d_res = d_res.merge(imp_drug_map,on='chem_name')
695690
d_res = d_res.rename(columns = {'improve_drug_id':'Drug'}) ## stupid but we have to change aks later
696691
d_res.to_csv(updated_raw_drug_file,sep='\t')
697692

698693
print("Starting Curve Fitting Algorithm")
699-
##WHERE DO I GET THE CURVE DATA?
700694
# Run Curve fitting algorithm from scripts directory.
701695
# Note the file path to fit_curve.py may need to be changed.
702-
command = ['python', 'fit_curve.py' ,'--input', 'beatAML_drug_raw.tsv', '--output', 'beatAML_drug_processed.tsv']
696+
command = ['python3', 'fit_curve.py' ,'--input', 'beatAML_drug_raw.tsv', '--output', 'beatAML_drug_processed.tsv', '--beataml']
703697
result = subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)
704698
if result.returncode == 0:
705699
print("Curve Fitting executed successfully!")
@@ -708,7 +702,6 @@ def generate_drug_list(drug_map_path,drug_path):
708702
print("Out:", result.stdout)
709703
print("Error:", result.stderr)
710704
print("Starting Experiment Data")
711-
#exp_res = map_exp_to_improve(d_res,improve_map_file)
712705
drug_path = "beatAML_drug_processed.tsv.0"
713706
exp_res = map_exp_to_improve(drug_path)
714707
exp_res.to_csv("/tmp/beataml_experiments.tsv", index=False, sep='\t')

build/build_all.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,7 @@ def main():
6666

6767
datasets = args.datasets.split(',')
6868

69+
6970
### Any new sample creation must happened here.
7071
### Each sample file requires the previous one to be created
7172
### current order is : DepMap, Sanger, CPTAC, HCMI, BeatAML, MPNST

build/utils/fit_curve.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -163,7 +163,7 @@ def process_df_part(df, fname, beataml=False, sep='\t', start=0, count=None):
163163
count = count or (4484081 - start)
164164
groups = islice(groups, start, start+count)
165165
cores = multiprocessing.cpu_count()
166-
poolsize = round(cores/2)
166+
poolsize = round(cores-1)
167167
print('we have '+str(cores)+' cores and '+str(poolsize)+' threads')
168168
with multiprocessing.Pool(processes=poolsize) as pool:
169169
results = pool.map(process_single_drug, groups)

0 commit comments

Comments
 (0)