UPDATE: This pipeline is depreciated. Please use PCMP_ITS_pipeline (https://github.com/PennChopMicrobiomeProgram/PCMP_ITS_pipeline) for fungal analysis
This is a Snakemake pipeline for analyzing fungal internal transcribed spacer (ITS) sequences using PIPITS (https://github.com/hsgweon/pipits) and BROCC (https://github.com/kylebittinger/brocc)
To install, we assume you already have installed Miniconda3
(https://docs.conda.io/en/latest/miniconda.html)
- Clone the repository:
git clone https://github.com/PennChopMicrobiomeProgram/ITS_PIPITS_BROCC.git
- Create a conda environment and install the required packages:
cd ITS_PIPITS_BROCC
conda create -n ITS_PIPITS_BROCC --channel bioconda --channel conda-forge --channel defaults python=3.6
conda install --name ITS_PIPITS_BROCC --file requirements.txt
/anaconda/envs/venv_name/bin/pip install brocc #brocc needs to be installed through your environment's pip
- The following software also need to be installed:
dnabc
(https://github.com/PennChopMicrobiomeProgram/dnabc)ITS_primer_trim
(https://github.com/PennChopMicrobiomeProgram/Primer_trim)- To install (dnabc as example):
git clone https://github.com/PennChopMicrobiomeProgram/dnabc cd dnabc conda activate ITS_PIPITS_BROCC pip install -e ./
To run the pipeline, we need
- Multiplexed R1/R2 read pairs
- Create a project directory, e.g.
/scr1/users/tuv/ITS_Run1
- Copy the files from this repository into that directory
- Edit
config.yml
so that it suits your project. In particular,- all: project_dir: path to the project directory, e.g.
"/scr1/users/tuv/ITS_Run1"
- all: ncbi_db: path to a local ncbi nt database, e.g.
"/path/to/nt"
- all: mux_dir: the directory containing multiplexed Illumina sequencing reads, which does not have to be in the project directory, e.g.
"/path/to/mux_files"
- all: demux_dir: the directory to contain the demultiplexed R1/R2 read pairs
- all: threads: number of threads to use
- all: ITS_subregion: can leave blank or one of
ITS1
orITS2
for ITS subregion extraction - all: mapping_file: Mapping file of samples with barcode information for demultiplexing
- demux: mismatch: Number of allowable basepair mismatches on barcode sequence for demultiplexing
- demux: revcomp: If
TRUE
, reverse complement barcode sequence before demultiplexing - trim: f_primer: Sequence of forward primer used for ITS PCR
- trim: r_primer: Sequence of reverse primer used for ITS PCR
- trim: mismatch: Number of allowable basepair mismatches on ITS PCR primers for trimming
- trim: min_length: Minimum length of primer to trim from reads
- all: project_dir: path to the project directory, e.g.
- To run the pipeline, activate the environment by entering
conda activate ITS_PIPITS_BROCC
,cd
into the project directory and execute:
snakemake \
--configfile path/to/config.yml \
--keep-going \
--latency-wait 90 \
--notemp
- When submitting jobs using
qsub
, you may runqsub run_snakemake.bash config.yml
- You can use the skeleton.Rmd to create a basic bioinformatic report from the results
create_local_taxonomy_db.py
may be used to install a local taxonomy db for faster processing
Input: Multiplexed Illumina sequencing files
Output: manifest.csv, total_read_counts.tsv, demultiplexed fastq files
Removes ITS forward and reverse primer sequences from reads
Filter out reads with less than 50 bps after trimming
Remove reads that still retain ITS forward and reverse primer sequences in the reads. These reads are low quality due to the formation of primer dimers, and usually have the reverse complement of the primer sequence at the beginning of the read.
Only use reads with a forward and reverse sequence
Run PIPITS pipeline to process reads and form OTUs clusters with or witout extracting the ITS subregion
Determine the taxonomic assignments of the OTUs by through a consensus based BLAST result