Skip to content

Latest commit

 

History

History
90 lines (64 loc) · 3.69 KB

02_BulkRNAseq.md

File metadata and controls

90 lines (64 loc) · 3.69 KB

Steps in Bulk RNA-seq analysis

1. Experimental Design

  • Define the biological question (e.g., comparing gene expression between conditions).
  • Plan replicates to ensure statistical power (at least 3 biological replicates per group is recommended).
  • Select an appropriate sequencing depth (typically 20-50 million reads per sample).

2. Sample Preparation and Sequencing

  • Extract high-quality RNA from your samples.
  • Assess RNA quality (e.g., using an Agilent Bioanalyser for RNA Integrity Number (RIN)).
  • Prepare cDNA libraries for sequencing.
  • Perform sequencing on a platform (e.g., Illumina) to generate raw reads.

3. Quality Control of Raw Reads

  • Inspect raw sequencing data using tools like:
    • FastQC: Provides an overview of quality metrics (base quality, GC content, adapter contamination).
  • Trim low-quality bases and remove adapters using tools like:
    • Trimmomatic or Cutadapt.

4. Alignment to Reference Genome

  • Map the cleaned reads to a reference genome (or transcriptome) using aligners like:
    • STAR: Fast and widely used for RNA-seq.
    • HISAT2: Efficient for spliced alignments.
  • Output: Aligned reads in a BAM file.

???What is a transcriptome? How does it different from a genome?

5. Post-Alignment Quality Control

  • Evaluate alignment results:
    • Use samtools flagstat to check the percentage of mapped reads.
    • Use RSeQC to access read distribution across genomic features. WHY???
  • Check for biases (e.g., 3' bias due to degraded RNA). WHY???

6. Quantification of Gene Expression

  • Count the number of reads mapped to each gene using tools like:
    • HTSeq or featureCounts: Count reads based on gene annotation files (e.g., GTF/GFF).
  • Output: A count matrix, where rows are genes and columns are samples.

7. Normalisation

  • Normalise the count data to account for differences in sequencing depth and gene length. HOW???
  • Common normalisation methods:
    • TPM (Transcripts Per Million): For comparing gene expression within a sample.
    • RPKM/FPKM: Length-normalised, but less commonly used now.
    • DESeq2 or edgeR normalisation: Scales raw counts for differential expression analysis.

8. Differential Gene Expression Analysis

  • Identify genes with significant expression differences between experimental groups.
  • Common tools:
    • DESeq2 (R-based): Handles raw counts directly.
    • edgeR (R-based): Suitable for small sample sizes.
  • Output:
    • List of differentially expressed genes (DEGs) with log2 fold changes and p-values.

9. Functional Annotation and Pathway Analysis

  • Interpret the biological relevance of DEGs by performing:
    • Gene Ontology (GO) Enrichment Analysis: Identify enriched biological processes, molecular functions, or cellular components.
    • Pathway Analysis: Map DEGs to pathways using tools like KEGG, Reactome, or GSEA.

10. Data Visualisation

  • Quality Control:
    • PCA plot: Visualise sample clustering.
    • Heatmaps: Show clustering of samples/genes.
  • Differential Expression:
    • MA plot: Log fold change vs. mean expression.
    • Volcano plot: Significant vs. log fold change.
  • Pathway Analysis:
    • Enrichment bar plots or network diagrams.

11. Validation

  • Validate key findings using an independent method:
    • qRT-PCR: Validate differential expression for a subset of genes.
  • Cross-reference with existing datasets or prior research.

12. Reporting

  • Compile findings into a report or publication:
    • Document methods, quality control steps, and statistical analyses.
    • Share data and scripts for reproducibility (e.g., GitHub or public repositories).