Skip to content

Input data

Freya Arthen edited this page Mar 3, 2022 · 6 revisions
  • genomic FASTA file
    • nucleotide sequence of your assembly divided in contigs/scaffolds
  • GFF3 file
  • proteins FASTA file (optional)
    • protein sequences for all protein coding genes in your data set
    • will be extracted within taXaminer pipeline using the tool gffread if not provided
  • coverage information (optional, but recommended)
    • taXaminer can run with or without coverage information
    • if you wish to include it, you need one of the following:
      1. per base coverage (PBC) file: tab-separated file with 3 columns (scaffold name, base number and coverage at given position) \
      2. mapping file (BAM format): sorted and indexed \
      3. raw read FASTA files: forward and backward or unpaired raw read files
    • multiple coverage data can be used
  • config file
    • YAML format
Clone this wiki locally