Skip to content

computes all variants including indels from Illumina sequences compared to a reference sequence

License

Notifications You must be signed in to change notification settings

medvir/variant-calling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Variant Calling (including indels)

Given Illumina sequencing data (*.fastq.gz) and a reference sequence (ref.fasta) this Snakemake workflow computes all variants.

Workflow

  • Samples random 200'000 reads (increase speed)
  • Maps reads to a reference sequence using bwa-mem
  • Adds indel qualities to BAM file
  • Calls variants (including indels) using LoFreq

rulegraph

Usage

Setup

Step 1: Install conda

Follow the steps on the Bioconda website to install Miniconda and set up Bioconda.

Step 2: Install snakemake

It's suggested to install snakemake in a separate conda environment as follows:

conda create -n snakemake snakemake=5.5.3
conda activate snakemake

Step 3: Clone workflow

git clone https://github.com/medvir/variant-calling.git
cd variant-calling/

Step 4: Get your data

Create raw_data/ folder and copy all .fastq.gz sequencing files and the reference sequence ref.fasta.
Those are used as input for the Snakemake workflow.

Step 5: Run Snakemake

Try running a dry-run first:
snakemake --use-conda -n

If there were no issues (all green) you can actually run Snakemake:
snakemake --use-conda

If you want to run multiple jobs in parallel, you can define the number of jobs with the -j flag:
(this can however in some cases lead to an error, not sure why)
snakemake --use-conda -j 4

Notes

  • Read clippings are not counted
  • LoFreq sets a minimum coverage of 10 by default
  • duplicated reads are not removed (could be done with seqkit rmdup -s)

About

computes all variants including indels from Illumina sequences compared to a reference sequence

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages