Miscellaneous materials (tutorials, exercises, testdata) to develop essential 'survival'-skills in bioinformatics.
git clone..
contact: [email protected]
- Introduction
- Reproducible science
- The command line 1. Basics 2. writing/running scripts
- Docker
- Using HPC
1. Connecting
2. Job submission
- SLURM
- PBS
- Manipulating FASTQ data
- FASTQ basics
- FASTQ file manipulation using command line skills
- FASTQ trimming 1. Fastx-toolkit 2. Trimmomatic
- Error correction 1. Illumina data 2. PacBio data
- paired end read merging 1. FLASh 2. Pear
- Demultiplexing
- Read mapping
- BWA
- Bowtie
- RADseq data
- Stacks
- Pyrad
- SNP calling
- Freebayes
- SNP annotation
- SNPeff
- Genome assembly
- Illumina data 1. Velvet 2. Spades 3. Celera
- PacBio 1. CANU 2. FALCON 3. MIRA
- Hybrid 1. MIRA 2. Spades 3. Celera
- Metaassembly, Gapfilling and polishing 1. PBjelly 2. quickmerge
- Assembly evaluation 1. Basic stats 2. Completeness 3. Contamination
- RNAseq data
- Denovo
- Reference genome based
- metabarcoding
- A basic BLAST search
- MEGAN
- Structural genome annotation
- Functional genome annotation
The UNIX command line provides highly efficient, simple and incredibly powerful tools for text file manipulation. Much of the NGS data you will be processing are nothing more than text files. Proficiency with some of these basic tools will get you a long way (you may not need anything else), so I have prepared a number of simple exercises that should help you develop your command line skills, specifically in the context of FASTQ data manipulation. It will hopefully also help you to get a feel for the kind of data that you will be working with. Get started here.