yabtap - Yet another bulk transcriptome assembly pipeline

This snakemake pipeline is useful for managing transcriptome assemblies over many sequencing runs. It takes a config file defining the samples, the libraries that belong to those samples, and any associated metadata. Using the metadata it assembles the transcriptomes using Trinity (and eventually also RNAspades), translates the output, automatically formats and renames the sequences, and makes nucleotide blast, protein blast, and diamond databases. The pipeline also performs transcript quantification of each library using Kallisto.

Benefits to using yabtap

The benefits to assembling transcriptomes using yabtap, as opposed to assembling them individually on a case-by-case basis:

Hands-on time is minimal. Fill out new entries in the config file, and all processing steps are completed automatically.
It is reproducible. Trimming and assembly parameters are consistent across samples. This is useful for large-scale comparative studies such as phylogenomics.
It is computationally efficient. All steps are handled by Snakemake to ensure that steps are only run when necessary, and are parallelized to finish the assemblies faster than a for-loop based pipeline.
It can be restarted easily. If a run crashes, simply execute the snakemake command again and the pipeline will pick up where it left off.
It has useful additional features. yabtap provides transcript counts and makes databases, automatically downloads SRA entries and converts to fastq files.

Requirements

The software required for this pipeline:

python3 with biopython
snakemake
Docker
kallisto
bioawk
Transdecoder
sra toolkit

Parameters

Trimming
- Adapters: The adapters used in the trimming protocol are a collection of all of the adapters found in the Trimmomatic software, and additional adapters for the Illumina smallRNA library prep kit. This pipeline aggressively trims adatpers and favors false positives over allowing some reads with adapters to pass into the normalization and de novo assembly steps.

Output

This pipeline makes files in the run directory, but also links files to a specified db directory. The structure of the db directory is:

|-counts
|---kallisto
|-----Bear_sp1_H7834-182_toepad_TRI
|-----Bear_sp1_H7384-182_hair_TRI
|---kallisto_merged
|-db
|-gff

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
adapters		adapters
documentation		documentation
scripts		scripts
README.md		README.md
Snakefile		Snakefile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

yabtap - Yet another bulk transcriptome assembly pipeline

Benefits to using yabtap

Requirements

Parameters

Output

About

Releases

Packages

Languages

conchoecia/yabtap

Folders and files

Latest commit

History

Repository files navigation

yabtap - Yet another bulk transcriptome assembly pipeline

Benefits to using yabtap

Requirements

Parameters

Output

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages