16S rDNA V3-V4 amplicon sequencing analysis

This GitHub repository includes codes and scripts that demonstrate the use of dada2 and phyloseq (and associated tools and R packages) to analyze 16S rDNA amplicon sequencing data. An working example is included in the example folder.

Disclaimer: Do not use any of the provided codes and scripts in production without fully understanding of the contents. I am not responsible for errors or omissions or for any consequences from use of the contents and make no warranty with respect to the currency, completeness, or accuracy of the contents from this GitHub repository.

Install required R packages

install.packages(c("ggplot2", "data.table", "R.utils", "plyr", "dplyr", "phangorn", "ape", "reshape2", 
	"gridExtra", "ggbeeswarm", "ggrepel", "vegan", "GUniFrac", "tibble"))

For R version >= 3.5 (Bioconductor version >= 3.8):

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install(c("dada2", "phyloseq", "DECIPHER", "DESeq2", "ShortRead", "Biostrings", 
	"biomformat", "ALDEx2"))

For older versions of R:

source("https://bioconductor.org/biocLite.R")
biocLite(c("dada2", "phyloseq", "DECIPHER", "DESeq2", "ShortRead", "Biostrings", "biomformat", "ALDEx2"))

Workflow

Update: Check out the new script cutadapt-and-dada2-per-run-processing.R, which combines running of cutadapt and DADA2 per-run-processing in one script

1. PCR primer trimming using cutadapt

run_trimming.pl
Cutadapt can run on multiple CPU cores in parallel but is only available on Python 3

Usage: perl run_trimming.pl project_folder fastq_folder forward_primer_sequence reverse_primer_sequence

Example:
- project_folder: PRJEB27564
- fastq_folder: raw
- forward_primer_sequence: 5'-CCTACGGGNGGCWGCAG-3'
- reverse_primer_sequence: 5'-GACTACHVGGGTATCTAATCC-3'

Usage: perl run_trimming.pl PRJEB27564 raw CCTACGGGNGGCWGCAG GACTACHVGGGTATCTAATCC

2a. DADA2 Part 1

dada2-per-run-processing.R - cutadapt performed before-hand
cutadapt-and-dada2-per-run-processing.R - Execute cutadapt in R by using system2 function
dada2 filtering and trimming filterAndTrim
Amplicon sequence variant (ASV) inference

2b. DADA2 Part 2

dada2-create-phyloseq-obj.R
Taxonomy and species assignment
Sample data preparation
Phylogenetic tree construction
1. RAxML:
Download the source file standard-RAxML-x.x.x.tar.gz

Use tar zxvf standard-RAxML-x.x.x.tar.gz to extract file

Use make -f Makefile.SSE3.PTHREADS.gcc to compile specific version
1. raxml-ng:
Download the pre-compiled binary raxml-ng_vx.x.x_linux_x86_64.zip (change x.x.x to downloaded version)

To extract files to a directory:
```
mkdir raxml-ng_vx.x.x
unzip raxml-ng_vx.x.x_linux_x86_64.zip -d raxml-ng_vx.x.x
```
Phyloseq object construction

3. Explore microbiome profiles using phyloseq

phyloseq-analysis.R

4. Identify and plot differential taxa features using LEfSe and GraPhlAn

lefse-analysis.R
LEfSe and GraPhlAn requires Python 2.7
One can create a Python 2 conda environment to work with LEfSe and GraPhlAn

# Create environment
conda create --name p27 python=2.7

# Activate environment
source activate p27

# Deactivate environment
conda deactivate

LEfSe: Download and unzip nsegata-lefse-9adc3a62460e.zip
- Additional packages required: rpy2, numpy, matplotlib, argparse
- Additional R libraries required: survival, mvtnorm, modeltools, coin, MASS
export2graphlan: Install using conda install -c bioconda export2graphlan
- Additional packages required: pandas, scipy
GraPhlAn Install using conda install -c bioconda graphlan
- Additional packages required: biopython, matplotlib

5. Infer functional capacity using picrust2 and statistical analysis using ALDEx

picrust2-aldex-analysis.R
picrust2 requires Python 3.5/3.6
Use conda to create conda environment with Python 3.6 and picrust2 2.3.0_b installed

# Create environment
conda create -n picrust2 -c bioconda -c conda-forge picrust2=2.3.0_b python=3.6

# Activate environment
conda activate picrust2

# Deactivate environment
conda deactivate

Download Silva and NCBI taxonomy DB

Silva taxonomic training data formatted for DADA2 (Silva version 138) https://zenodo.org/record/3731176#.XvdOiihKhaQ

silva_nr_v138_train_set.fa.gz
silva_species_assignment_v138.fa.gz

Create from NCBI 16SMicrobial blast DB

Install BLAST

conda install -c bioconda blast

or

sudo apt-get install ncbi-blast+

Download NCBI's 16S rRNA BLAST DB

wget ftp://ftp.ncbi.nih.gov/blast/db/16S_ribosomal_RNA.tar.gz
tar zxf 16S_ribosomal_RNA.tar.gz

Convert 16SMicrobial BLAST DB into FASTA format

blastdbcmd -db 16S_ribosomal_RNA -entry all -out 16SMicrobial.fa
gzip 16SMicrobial.fa

Download taxa_summary.R

http://evomics.org/phyloseq/taxa_summary-r/

Use gunzip taxa_summary.R.gz to extract file

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

16S rDNA V3-V4 amplicon sequencing analysis

Install required R packages

Workflow

1. PCR primer trimming using cutadapt

2a. DADA2 Part 1

2b. DADA2 Part 2

3. Explore microbiome profiles using phyloseq

4. Identify and plot differential taxa features using LEfSe and GraPhlAn

5. Infer functional capacity using picrust2 and statistical analysis using ALDEx

Download Silva and NCBI taxonomy DB

Download taxa_summary.R

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 136 Commits
doc		doc
example		example
scripts		scripts
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml
cutadapt-and-dada2-per-run-processing.R		cutadapt-and-dada2-per-run-processing.R
dada2-create-phyloseq-obj.R		dada2-create-phyloseq-obj.R
dada2-per-run-processing.R		dada2-per-run-processing.R
lefse-analysis.R		lefse-analysis.R
phyloseq-analysis.R		phyloseq-analysis.R
picrust2-aldex-analysis.R		picrust2-aldex-analysis.R
run_trimming.pl		run_trimming.pl

License

GeoWc/16S-rDNA-V3-V4

Folders and files

Latest commit

History

Repository files navigation

16S rDNA V3-V4 amplicon sequencing analysis

Install required R packages

Workflow

1. PCR primer trimming using cutadapt

2a. DADA2 Part 1

2b. DADA2 Part 2

3. Explore microbiome profiles using phyloseq

4. Identify and plot differential taxa features using LEfSe and GraPhlAn

5. Infer functional capacity using picrust2 and statistical analysis using ALDEx

Download Silva and NCBI taxonomy DB

Download taxa_summary.R

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages