Skip to content

DanHUMassMed/RNA-Seq-Nextflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

c09c0dd · Jan 19, 2025
Jul 11, 2024
Dec 12, 2024
Jan 19, 2025
Nov 4, 2023
Jun 12, 2024
Dec 12, 2024
Nov 17, 2023
Nov 23, 2023
Dec 12, 2024
Jan 11, 2024
Nov 2, 2023
Oct 14, 2023
Dec 12, 2024
Dec 12, 2024
Nov 15, 2023

Repository files navigation

RNA-Seq Analysis

Introduction

This repository utilizes Nextflow to create a reproducable bioinformatics pipeline for RNA sequencing analysis using STAR, RSEM and/or Salmon with gene counts and extensive quality control.

The pipeline takes FASTQ files as input, performs initial MD5 checks, quality control (QC) checks, trimming, and alignment, and produces a gene expression matrix, QC reports, and gene set enrichment data.

Pipeline Process

  • 1a. Get FASTQ data from Dropbox and move it to the HPC (get_dropbox_data-<PI_NAME>.nf)
  • 1b. Check the MD5 Checksum values of the transferred data (check_md5.py)
  • 2a. Update Wormbase GeneIDs based on Version Number (e.g.,WS289) (utility/wormbase_download.sh, create_star_rsem_index.nf, create_salmon_index.nf)
  • 2b. Get genome/transcript data from Wormbase for alignment (utility/wormbase_download.sh)
  • 3a. Create STAR and rsem indexes (create_star_rsem_index.nf)
  • 3b. Create Salmon index file (create_salmon_index.nf)
    • NOTE Salmon process is currently used for testing and validation only
  • 4a. Execute Quality Control on FASTQ Data (rnaseq-rsem-<PI_NAME>.nf)
  • 4b. Align FASTQ data to the Worm Genome
  • 4c. Quantify the Gene Expression
  • 4d. Summarize results for further analysis
  • 4e. Aggregate the QC Reports for simplified review
  • 5a. Execute DEBrowser to perform Differential Expression Analysis (utility/start_debrowser.sh)
  • 5b. Trim Data as needed
  • 5c. Produce heatmap visualizations
  • 6a. Execute Wormcat Batch (wormcat_batch.nf)

Note: when running use nextflow run PIPLINE.nf -bg -N [email protected], which will run in the background and email when the process terminates (success or failure)

Pipeline Outputs

  • MD5 Checksum Report
  • FAST QC Reports
  • Multi QC Report
  • Isoform Quantification
  • Gene Quantification
  • DESeq2 heatmap visualizations
  • Wormcat annotations and visualization of gene set enrichment data

Process flow diagrams