The Autopath project explores how to surface biological pathways for two diseases, as a way to make it easier for biologists to get insight from the raw data. Our two case studies are Age-Related Macular Degeneration - a degenerative process affecting the macula of the retina that occurs in older adults and results in impairment or loss of central vision - and Systemic Lupus Erythematosus (SLE) - a chronic autoimmune disease where the immune system mistakenly attacks healthy tissues and organs, leading to inflammation and damage.
How can we leverage existing data to understand and treat disease?
- There's too much data, sometimes stored in different locations and on different cloud infrastructure, and it's too hard to find and combine the right data for analysis.
- Analysis tools require too much comp bio wrangling and are not user-friendly (for biologists).
- Create a reproducible workflow to transform single cell expression data from different sources into a visualization of gene pathway enrichment
- Access data where it lives - possibly different “places” - and combine for the most robust dataset
- Analyze collaboratively in a Terra workspace using reproducible tools where it’s most efficient - possibly two different clouds (AWS and GCP).

Age-related macular degeneration (AMD) affects 12% of Americans over 40 [1] and is a leading cause of blindness. Better knowledge of the biomolecular causes of AMD would improve treatment, and improve American healthspans. GTEx, an NIH CFDE project, has unique, rich, and systematic data on gene expression in human eyes [2, 3]. Data is publicly accessible on AnVIL. Exporting this data to the Terra biomedical cloud analysis platform, conducting reproducible differential expression analysis on it, and exploring the resulting data in biological pathway diagrams will improve understanding of this complex disease's biochemical causes and effects, and stimulate the efficient development of better diagnostics and medical treatments for AMD.
[1] https://pmc.ncbi.nlm.nih.gov/articles/PMC9634594/
[2] https://explore.anvilproject.org/datasets/e5aee011-bdb3-4caa-954c-a46678656270
[3] https://pmc.ncbi.nlm.nih.gov/articles/PMC6441365/
INCLUDE DATA DESCRIPTION
- Access from Single Cell Portal at https://singlecell.broadinstitute.org/single_cell/study/SCP2805/hrca-snrna-seq-of-the-human-retina-all-cells#study-summary.
- INCLUDE overview of how to access and where to put the data
- Access GTEx public data in the AnVIL Data Explorer at https://explore.anvilproject.org/datasets/e5aee011-bdb3-4caa-954c-a46678656270. Note that you must be signed in with your Terra user ID to export data from the Data Explorer.
- For step-by-step instructions on how to access and download data from the Data Explorer to a Terra workspace, see the support doc on how to export AnVIL data for analysis at https://support.terra.bio/hc/en-us/articles/34607573660827-Part-3-Export-AnVIL-data-to-Terra-for-analysis.
The ultimate goal is to include quality assurance with Silhouette score correlated to F-score.
Advanced Analytics Module
- EC Orchestration,
- Nextflow Execution,
- Cost management,
- Workspace collaboration
Dockerize the NSForest 4.1 Python package
-
Dockerfile
-
GitHub actions CI and Image publication
-
Docker: Make sure Docker is installed and actively running in the background. Verify Docker installation and status using:
docker --version
Forked NSForest - with Docker, GitHub actions CI, Publish https://github.com/adeslatt/NSForest/blob/master/README.md
- Updating to 4.1
- greater modularization, autodocumentation with Sphinx
INCLUDE brief analysis tool description and link here
INCLUDE brief analysis tool description and link here
INCLUDE brief analysis tool description and link here
INCLUDE brief analysis tool description and link here.
INCLUDE brief analysis tool description and link here.
LINK TO WORKSPACE AND ADD DESCRIPTION HERE
NS Forest batch on AWS
INCLUDE TOOL DESCRIPTION AND LINK HERE
- Docker:
Make sure Docker is installed and actively running in the background.
Verify Docker installation and status using:
docker --version
Enricher + CFDE Knowledge Center
INCLUDE ANALYSIS/TOOL DESCRIPTION AND LINK HERE
INCLUDE ANALYSIS/TOOL DESCRIPTION AND LINK HERE
Render interactive biological pathways diagram
INCLUDE ANALYSIS/TOOL DESCRIPTION AND LINK HERE
https://github.com/omicscodeathon/Interactive-Analysis-with-Biological-Pathways/
INCLUDE things we wanted to do but couldn’t
- Eric Weitz: The Broad Institute
- Anne Deslattes Mays: Science and Technology Consulting, LLC
- Lei Ma: Harvard FAS Informatics Group
- Olaitan I. Awe: Independent Bioinformatics Consultant.
- Peter Fan: Northeastern University Library
- Allie Cliffe: The Broad Institute