Steps in Bulk RNA-seq analysis

1. Experimental Design

Define the biological question (e.g., comparing gene expression between conditions).
Plan replicates to ensure statistical power (at least 3 biological replicates per group is recommended).
Select an appropriate sequencing depth (typically 20-50 million reads per sample).

2. Sample Preparation and Sequencing

Extract high-quality RNA from your samples.
Assess RNA quality (e.g., using an Agilent Bioanalyser for RNA Integrity Number (RIN)).
Prepare cDNA libraries for sequencing.
Perform sequencing on a platform (e.g., Illumina) to generate raw reads.

3. Quality Control of Raw Reads

Inspect raw sequencing data using tools like:
- FastQC: Provides an overview of quality metrics (base quality, GC content, adapter contamination).
Trim low-quality bases and remove adapters using tools like:
- Trimmomatic or Cutadapt.

4. Alignment to Reference Genome

Map the cleaned reads to a reference genome (or transcriptome) using aligners like:
- STAR: Fast and widely used for RNA-seq.
- HISAT2: Efficient for spliced alignments.
Output: Aligned reads in a BAM file.

???What is a transcriptome? How does it different from a genome?

5. Post-Alignment Quality Control

Evaluate alignment results:
- Use samtools flagstat to check the percentage of mapped reads.
- Use RSeQC to access read distribution across genomic features. WHY???
Check for biases (e.g., 3' bias due to degraded RNA). WHY???

6. Quantification of Gene Expression

Count the number of reads mapped to each gene using tools like:
- HTSeq or featureCounts: Count reads based on gene annotation files (e.g., GTF/GFF).
Output: A count matrix, where rows are genes and columns are samples.

7. Normalisation

Normalise the count data to account for differences in sequencing depth and gene length. HOW???
Common normalisation methods:
- TPM (Transcripts Per Million): For comparing gene expression within a sample.
- RPKM/FPKM: Length-normalised, but less commonly used now.
- DESeq2 or edgeR normalisation: Scales raw counts for differential expression analysis.

8. Differential Gene Expression Analysis

Identify genes with significant expression differences between experimental groups.
Common tools:
- DESeq2 (R-based): Handles raw counts directly.
- edgeR (R-based): Suitable for small sample sizes.
Output:
- List of differentially expressed genes (DEGs) with log2 fold changes and p-values.

9. Functional Annotation and Pathway Analysis

Interpret the biological relevance of DEGs by performing:
- Gene Ontology (GO) Enrichment Analysis: Identify enriched biological processes, molecular functions, or cellular components.
- Pathway Analysis: Map DEGs to pathways using tools like KEGG, Reactome, or GSEA.

10. Data Visualisation

Quality Control:
- PCA plot: Visualise sample clustering.
- Heatmaps: Show clustering of samples/genes.
Differential Expression:
- MA plot: Log fold change vs. mean expression.
- Volcano plot: Significant vs. log fold change.
Pathway Analysis:
- Enrichment bar plots or network diagrams.

11. Validation

Validate key findings using an independent method:
- qRT-PCR: Validate differential expression for a subset of genes.
Cross-reference with existing datasets or prior research.

12. Reporting

Compile findings into a report or publication:
- Document methods, quality control steps, and statistical analyses.
- Share data and scripts for reproducibility (e.g., GitHub or public repositories).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

02_BulkRNAseq.md

02_BulkRNAseq.md

Steps in Bulk RNA-seq analysis

1. Experimental Design

2. Sample Preparation and Sequencing

3. Quality Control of Raw Reads

4. Alignment to Reference Genome

5. Post-Alignment Quality Control

6. Quantification of Gene Expression

7. Normalisation

8. Differential Gene Expression Analysis

9. Functional Annotation and Pathway Analysis

10. Data Visualisation

11. Validation

12. Reporting

Files

02_BulkRNAseq.md

Latest commit

History

02_BulkRNAseq.md

File metadata and controls

Steps in Bulk RNA-seq analysis

1. Experimental Design

2. Sample Preparation and Sequencing

3. Quality Control of Raw Reads

4. Alignment to Reference Genome

5. Post-Alignment Quality Control

6. Quantification of Gene Expression

7. Normalisation

8. Differential Gene Expression Analysis

9. Functional Annotation and Pathway Analysis

10. Data Visualisation

11. Validation

12. Reporting