Skip to content

CustomOrthologs

ebersber edited this page Nov 4, 2018 · 7 revisions

Formatting of Custom Ortholog Groups

If you wish to use pre-computed orthologs for your seed protein, then set orthologs_prediction in the parameter group Basic Preprocessing to NO. You will then have to provide protTrace with your pre-computed orthologs.

  • Copy the orthologs file in the cache directory and rename the file to ogSeqs_$seedProteinName.fa. Replace the $seedProteinName with the protein name as displayed in the header of the input fasta file for protTrace run. For example, if your seed protein input file is as shown below, then the $seedProteinName should be ‘Human_OR4F5’.
>Human_OR4F5
MVTEFIFLGLSDSQELQTFLFMLFFVFYGGIVFGNLLIVITVVSDSHLHSPMYFLLANLSLIDLSL
SSVTAPKMITDFFSQRKVISFKGCLVQIFLLHFFGGSEMVILIAMGFDRYIAICKPLHYTTIMCGN
ACVGIMAVTWGIGFLHSVSQLAFAVHLLFCGPNEVDSFYCDLPRVIKLACTDTYRLDIMVIANSGV
LTVCSFVLLIISYTIILMTIQHRPLDKSSKALSTLTAHITVVLLFFGPCVFIYAWPFPIKSLDKFL
AVFYSVITPLLNPIIYTLRNKDMKTAIRQLRKWDAHSSVKF
  • Make sure the sequences in the orthologs file are formatted such, that each sequence is represented in a single line, and is not interrupted by line breaks. See HERE for an awk command that fulfills this task
  • Change the fasta header inside the ortholog file have to contain only the species name according to the OMA format. A simple example file is shown below. The file should be placed under /path/to/protTrace/cache/ogSeqs_Human_OR4F5.fa
>HUMAN
MVTEFIFLGLSDSQELQTFLFMLFFVFYGG...
>NEMVE
MNDSSSIACSSHKLEVGILLAVNCVSAIAT...
>CAEEL
MFTSTLAPMVLALLENDTSIIATTQSSMSP...

Here, HUMAN, NEMVE and CAEEL are the OMA species abbreviations for Homo sapiens, Nematostella vectensis* and Caenorhabditis elegans, respectively. If your orthologous group comprises in-paralogs, simply prepend a number before every identifier. The file must be placed under /path/to/protTrace/cache/ogSeqs_Human_OR4F5.fa

>HUMAN
MVTEFIFLGLSDSQELQTFLFMLFFVFYGG...
>1_NEMVE
MNDSSSIACSSHKLEVGILLAVNCVSAIAT...
>2_NEMVE
MQSTFNGTHDNTCFFLRLDTRAVHEVYASF...
>3_NEMVE
MKKSCLFYTYDSANFTEKSSLIVLIILNSV...
>1_CAEEL
MFTSTLAPMVLALLENDTSIIATTQSSMSP...
>2_CAEEL
MDSNVKYFMYEIFIPSIIILCCVAAFLNFM...
Clone this wiki locally