Skip to content

Commit de26c50

Browse files
authored
Merge branch 'main' into 211-missing-pyprojecttoml
2 parents c29c50d + e2a91ed commit de26c50

29 files changed

+5134
-85
lines changed

.github/workflows/build.yml

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -177,6 +177,30 @@ jobs:
177177
push: true
178178
platforms: linux/amd64
179179

180+
build-pancpdo:
181+
runs-on: ubuntu-latest
182+
steps:
183+
- name: Checkout
184+
uses: actions/checkout@v3
185+
- name: Set up QEMU
186+
uses: docker/setup-qemu-action@v3
187+
- name: Set up Docker Buildx
188+
uses: docker/setup-buildx-action@v3
189+
- name: Login to DockerHub
190+
uses: docker/login-action@v3
191+
with:
192+
username: ${{ secrets.DOCKERHUB_USERNAME }}
193+
password: ${{ secrets.DOCKERHUB_PASSWORD }}
194+
- name: Build and push pancpdo
195+
uses: docker/build-push-action@v3
196+
with:
197+
file: ./build/docker/Dockerfile.pancpdo
198+
tags: |
199+
sgosline/pancpdo:latest
200+
sgosline/pancpdo:${{ github.ref_name }}
201+
push: true
202+
platforms: linux/amd64
203+
180204
build-upload:
181205
runs-on: ubuntu-latest
182206
steps:

.github/workflows/main.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ on:
44
push:
55
tags:
66
- '*' # Triggers the workflow only on version tags
7+
workflow_dispatch: # Allows manual triggering of the workflow
78

89
# Sets permissions of the GITHUB_TOKEN to allow deployment to GitHub Pages
910
permissions:
@@ -44,4 +45,4 @@ jobs:
4445
steps:
4546
- name: Deploy to GitHub Pages
4647
id: deployment
47-
uses: actions/deploy-pages@v4
48+
uses: actions/deploy-pages@v4

.gitignore

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,5 +18,5 @@ tests/__pycache__
1818
dist
1919
build/lib
2020
build/local
21-
22-
coderdata/_version.py
21+
coderdata/_version.py
22+
local/

build/README.md

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -10,11 +10,10 @@ are added.
1010

1111
## build_all.py script
1212

13-
This script initializes all docker containers, builds all datasets, validates them, and uploads them to figshare and pypi.
13+
This script initializes all docker containers, builds all datasets, validates them, and uploads them to figshare.
1414

1515
It requires the following authorization tokens to be set in the local environment depending on the use case:
1616
`SYNAPSE_AUTH_TOKEN`: Required for beataml and mpnst datasets. Join the [CoderData team](https://www.synapse.org/#!Team:3503472) on Synapse and generate an access token.
17-
`PYPI_TOKEN`: This token is required to upload to PyPI.
1817
`FIGSHARE_TOKEN`: This token is required to upload to Figshare.
1918
`GITHUB_TOKEN`: This token is required to upload to GitHub.
2019

@@ -25,21 +24,20 @@ It requires the following authorization tokens to be set in the local environmen
2524
- `--omics`: Processes and builds the omics data files.
2625
- `--drugs`: Processes and builds the drug data files.
2726
- `--exp`: Processes and builds the experiment data files.
28-
- `--all`: Executes all available processes above (docker, samples, omics, drugs, exp). This does not run the validate, figshare, or pypi commands.
27+
- `--all`: Executes all available processes above (docker, samples, omics, drugs, exp). This does not run the validate or figshare commands.
2928
- `--validate`: Validates the generated datasets using the schema check scripts. This is automatically included if data upload occurs.
3029
- `--figshare`: Uploads the datasets to Figshare. FIGSHARE_TOKEN must be set in local environment.
31-
- `--pypi`: Uploads the package to PyPI. PYPI_TOKEN must be set in local environment.
3230
- `--high_mem`: Utilizes high memory mode for concurrent data processing. This has been successfully tested using 32 or more vCPUs.
3331
- `--dataset`: Specifies the datasets to process (default='broad_sanger,hcmi,beataml,mpnst,cptac').
34-
- `--version`: Specifies the version number for the PyPI package and Figshare upload title (e.g., "0.1.29"). This is required for figshare and PyPI upload steps. This must be a higher version than previously published versions.
32+
- `--version`: Specifies the version number for the Figshare upload title (e.g., "0.1.29"). This must be a higher version than previously published versions.
3533
- `--github-username`: GitHub username matching the GITHUB_TOKEN. Required to push the new Tag to the GitHub Repository.
3634
- `--github-email`: GitHub email matching the GITHUB_TOKEN. Required to push the new Tag to the GitHub Repository.
3735

3836
**Example usage**:
39-
- Build all datasets and upload to Figshare and PyPI and GitHub.
40-
Required tokens for the following command: `SYNAPSE_AUTH_TOKEN`, `PYPI_TOKEN`, `FIGSHARE_TOKEN`, `GITHUB_TOKEN`.
37+
- Build all datasets and upload to Figshare and GitHub.
38+
Required tokens for the following command: `SYNAPSE_AUTH_TOKEN`, `FIGSHARE_TOKEN`, `GITHUB_TOKEN`.
4139
```bash
42-
python build/build_all.py --all --high_mem --validate --pypi --figshare --version 0.1.41 --github-username jjacobson95 --github-email [email protected]
40+
python build/build_all.py --all --high_mem --validate --figshare --version 0.1.41 --github-username jjacobson95 --github-email [email protected]
4341
```
4442

4543
- Build only the experiment files.

build/build_all.py

Lines changed: 13 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -14,34 +14,33 @@
1414

1515
def main():
1616
parser=argparse.ArgumentParser(
17-
description="This script initializes all docker containers, builds datasets, validates them, and uploads to Figshare and PyPI.",
17+
description="This script initializes all docker containers, builds datasets, validates them, and uploads to Figshare.",
1818
epilog="""Examples of usage:
1919
20-
Build all datasets in a high memory environment, validate them, and upload to Figshare and PyPI:
21-
python build/build_all.py --all --high_mem --validate --pypi --figshare --version 0.1.29
20+
Build all datasets in a high memory environment, validate them, and upload to Figshare:
21+
python build/build_all.py --all --high_mem --validate --figshare --version 0.1.29
2222
2323
Build only experiment files. This assumes preceding steps (docker images, samples, omics, and drugs) have already been completed:
2424
python build/build_all.py --exp
2525
2626
Validate all local files without building or uploading. These files must be located in ./local. Includes compression/decompression steps.
2727
python build/build_all.py --validate
2828
29-
Upload the latest data to Figshare and PyPI (ensure tokens are set in the local environment):
30-
python build/build_all.py --figshare --pypi --version 0.1.30
29+
Upload the latest data to Figshare (ensure tokens are set in the local environment):
30+
python build/build_all.py --figshare --version 0.1.30
3131
"""
3232
)
3333
parser.add_argument('--docker',dest='docker',default=False,action='store_true', help="Build all docker images.")
3434
parser.add_argument('--samples',dest='samples',default=False,action='store_true', help="Build all sample files.")
3535
parser.add_argument('--omics',dest='omics',default=False,action='store_true', help="Build all omics files.")
3636
parser.add_argument('--drugs',dest='drugs',default=False,action='store_true', help="Build all drug files")
3737
parser.add_argument('--exp',dest='exp',default=False,action='store_true', help="Build all experiment file.")
38-
parser.add_argument('--validate', action='store_true', help="Run schema checker on all local files. Note this will be run, whether specified or not, if figshare or pypi arguments are included.")
38+
parser.add_argument('--validate', action='store_true', help="Run schema checker on all local files. Note this will be run, whether specified or not, if figshare arguments are included.")
3939
parser.add_argument('--figshare', action='store_true', help="Upload all local data to Figshare. FIGSHARE_TOKEN must be set in local environment.")
40-
parser.add_argument('--pypi', action='store_true', help="Update PYPI Package with latest Figshare data. PYPI_TOKEN must be set in local environment.")
41-
parser.add_argument('--all',dest='all',default=False,action='store_true', help="Run all data build commands. This includes docker, samples, omics, drugs, exp arguments. This does not run the validate, figshare, or pypi commands.")
40+
parser.add_argument('--all',dest='all',default=False,action='store_true', help="Run all data build commands. This includes docker, samples, omics, drugs, exp arguments. This does not run the validate or figshare commands")
4241
parser.add_argument('--high_mem',dest='high_mem',default=False,action='store_true',help = "If you have 32 or more CPUs, this option is recommended. It will run many code portions in parallel. If you don't have enough memory, this will cause a run failure.")
4342
parser.add_argument('--dataset',dest='datasets',default='broad_sanger,hcmi,beataml,cptac,mpnst,mpnstpdx',help='Datasets to process. Defaults to all available.')
44-
parser.add_argument('--version', type=str, required=False, help='Version number for the PyPI package and Figshare upload title (e.g., "0.1.29"). This is required for Figshare and PyPI upload. This must be a higher version than previously published versions.')
43+
parser.add_argument('--version', type=str, required=False, help='Version number for the Figshare upload title (e.g., "0.1.29"). This is required for Figshare upload. This must be a higher version than previously published versions.')
4544
parser.add_argument('--github-username', type=str, required=False, help='GitHub username for the repository.')
4645
parser.add_argument('--github-email', type=str, required=False, help='GitHub email for the repository.')
4746

@@ -266,8 +265,6 @@ def run_docker_upload_cmd(cmd_arr, all_files_dir, name, version):
266265
docker_run = ['docker', 'run', '--rm', '-v', f"{env['PWD']}/local/{all_files_dir}:/tmp", '-e', f"VERSION={version}"]
267266

268267
# Add Appropriate Environment Variables
269-
if 'PYPI_TOKEN' in env and name == 'PyPI':
270-
docker_run.extend(['-e', f"PYPI_TOKEN={env['PYPI_TOKEN']}", 'upload'])
271268
if 'FIGSHARE_TOKEN' in env and name == 'Figshare':
272269
docker_run.extend(['-e', f"FIGSHARE_TOKEN={env['FIGSHARE_TOKEN']}", 'upload'])
273270
if name == "validate":
@@ -308,16 +305,13 @@ def compress_file(file_path):
308305
#####
309306

310307
figshare_token = os.getenv('FIGSHARE_TOKEN')
311-
pypi_token = os.getenv('PYPI_TOKEN')
312308
synapse_auth_token = os.getenv('SYNAPSE_AUTH_TOKEN')
313309
github_token = os.getenv('GITHUB_TOKEN')
314310

315311

316312
# Error handling for required tokens
317313
if args.figshare and not figshare_token:
318314
raise ValueError("FIGSHARE_TOKEN environment variable is not set.")
319-
if args.pypi and not pypi_token:
320-
raise ValueError("PYPI_TOKEN environment variable is not set.")
321315
if ('beataml' in args.datasets or 'mpnst' in args.datasets) and not synapse_auth_token:
322316
if args.docker or args.samples or args.omics or args.drugs or args.exp or args.all: # Token only required if building data, not upload or validate.
323317
raise ValueError("SYNAPSE_AUTH_TOKEN is required for accessing MPNST and beatAML datasets.")
@@ -394,7 +388,7 @@ def compress_file(file_path):
394388
### Begin Upload and/or validation
395389
#####
396390

397-
if args.pypi or args.figshare or args.validate:
391+
if args.figshare or args.validate:
398392
# FigShare File Prefixes:
399393
prefixes = ['beataml', 'hcmi', 'cptac', 'mpnst', 'genes', 'drugs']
400394
broad_sanger_datasets = ["ccle","ctrpv2","fimm","gdscv1","gdscv2","gcsi","prism","nci60"]
@@ -405,23 +399,18 @@ def compress_file(file_path):
405399

406400

407401
figshare_token = os.getenv('FIGSHARE_TOKEN')
408-
pypi_token = os.getenv('PYPI_TOKEN')
409402

410403
all_files_dir = 'local/all_files_dir'
411404
if not os.path.exists(all_files_dir):
412405
os.makedirs(all_files_dir)
413-
414-
# Ensure pypi tokens are available
415-
if args.pypi and not pypi_token:
416-
raise ValueError("Required tokens (PYPI) are not set in environment variables.")
417406

418407
# Ensure figshare tokens are available
419408
if args.figshare and not figshare_token:
420409
raise ValueError("Required tokens (FIGSHARE) are not set in environment variables.")
421410

422411
# Ensure version is specified
423-
if (args.figshare or args.pypi) and not args.version:
424-
raise ValueError("Version must be specified when pushing to pypi or figshare")
412+
if args.figshare and not args.version:
413+
raise ValueError("Version must be specified when pushing to figshare")
425414

426415
# Move relevant files to a designated directory
427416
for file in glob(os.path.join("local", '*.*')):
@@ -433,7 +422,7 @@ def compress_file(file_path):
433422
decompress_file(file)
434423

435424
# Run schema checker - This will always run if uploading data.
436-
schema_check_command = ['python3', 'check_schema.py', '--datasets'] + datasets
425+
schema_check_command = ['python3', 'scripts/check_schema.py', '--datasets'] + datasets
437426
run_docker_upload_cmd(schema_check_command, 'all_files_dir', 'validate', args.version)
438427

439428
print("Validation complete. Proceeding with file compression/decompression adjustments")
@@ -453,13 +442,9 @@ def compress_file(file_path):
453442
figshare_command = ['python3', 'scripts/push_to_figshare.py', '--directory', "/tmp", '--title', f"CODERData{args.version}", '--token', os.getenv('FIGSHARE_TOKEN'), '--project_id', '189342', '--publish']
454443
run_docker_upload_cmd(figshare_command, 'all_files_dir', 'Figshare', args.version)
455444

456-
# Upload to PyPI using Docker
457-
if args.pypi and args.version and pypi_token:
458-
pypi_command = ['python3', 'scripts/push_to_pypi.py', '-y', '/tmp/figshare_latest.yml', '-d', 'coderdata/download/downloader.py', "-v", args.version]
459-
run_docker_upload_cmd(pypi_command, 'all_files_dir', 'PyPI', args.version)
460445

461446
# Push changes to GitHub using Docker
462-
if args.version and args.figshare and args.pypi and pypi_token and figshare_token and github_token and args.github_username and args.github_email:
447+
if args.version and args.figshare and figshare_token and github_token and args.github_username and args.github_email:
463448
git_command = [
464449
'bash', '-c', (
465450
f'git config --global user.name "{args.github_username}" '

build/build_dataset.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ def process_docker(dataset,validate):
4242
'beataml': ['beataml'],
4343
'mpnst': ['mpnst'],
4444
'mpnstpdx': ['mpnstpdx'],
45+
'pancpdo': ['pancpdo'],
4546
'cptac': ['cptac'],
4647
'genes': ['genes'],
4748
'upload': ['upload']
@@ -123,6 +124,7 @@ def process_omics(executor, dataset, should_continue):
123124
'broad_sanger': ['copy_number', 'mutations', 'proteomics', 'transcriptomics'],
124125
'cptac': ['copy_number', 'mutations', 'proteomics', 'transcriptomics'],
125126
'hcmi': ['mutations', 'transcriptomics'],
127+
'pancpdo': ['transcriptomics'],
126128
'mpnstpdx':['copy_number', 'mutations', 'proteomics', 'transcriptomics']
127129
}
128130

0 commit comments

Comments
 (0)