Genome2OR

Annotate Olfactory receptor CDS from genome

Contents Index

Abstract
Quick start
Download script
Usage
Example

Abstract

Genome2OR is a genetic annotation tool based on HMMER, MAFFT and CD-HIT.
HMMER searches biological sequence databases for homologous sequences, using either single sequences or multiple sequence alignments as queries. HMMER implements a technology called "profile hidden Markov models" (profile HMMs).

MAFFT is a multiple sequence alignment program for unix-like operating systems. It offers a range of multiple alignment methods, L-INS-i (accurate; for alignment of <∼200 sequences), FFT-NS-2 (fast; for alignment of <∼30,000 sequences), etc.

CD-HIT is very fast and can handle extremely large databases. CD-HIT helps to significantly reduce the computational and manual efforts in many sequence analysis tasks and aids in understanding the data structure and correct the bias within a dataset.

Quick start

Execute nhmmer.py
```
cd YouDir/Genome2OR/scripts
python nhmmer.py ../template/Mammalia.hmm genome.fasta nhmmer_out.tblout -v
```
Mammalia.hmm is the HMM profile for mammalia. You can find more species of HMM profile in YouDir/Genome2OR/template directory if needed. genome.fasta is genome(DNA) that needs annotation.

Execute FindOR.py

cd YourDir/Genome2OR/scripts
python FindOR.py nhmmer_out.tblout genome.fasta -o ../output -o ORannotation -v

Execute IdentityFunc.py
```
cd YourDir/Genome2OR/scripts
python IdentityFunc.py ../output/ORannotation_ORs_pro.fa -o ../output -p Identity -v
```
After running 1,2,3, you will find file Identity_func_ORs.fasta in directory ../output, which is the Olfactory receptor we finally found

Execute batch.py

cd YourDir/Genome2OR/scripts
python batch.py profile.hmm inputdir nhmmeroutdir outputdir -v

For batch annotation of genomes.

Download script

Download script from github

cd YourDir
git clone https://github.com/ToHanwei/Genome2OR.git

Enter the directory
```
cd ./Genome2OR/scripts 
```

Usage

Main modules

nhmmer.py: Simplify running program nhmmer program.
FindOR.py: Extract olfactory receptor cds from genome.
IdentifyFunc.py: Recognition function OR gene
Iteration.py: Iterration annotated a species
batch.py: Batch annotated genome.

nhmmer.py Usage

python nhmmer.py -h[--help, None]

usage: run_nhmmer [-h] [-e] [-c] [-v] [-V] profile genome output

Autorun nhmmer

positional arguments:
    profile              String, profile nhmmer need(hmm, [un]alignment file)
    genome               String, genomic data file path.
    output               String, save nhmmer output file path.

optional arguments:
    -h, --help           show this help message and exit
    -e , --EvalueLimit   Float, Sequence similarity threshold. (default:1e-10)
    -c , --cpus          number of parallel CPU workers to use. (default='2/3 of
                         all cores')
    -v, --verbose        Print verbose information.
    -V, --version        Show version message and exit.

http://zhaolab.shanghaitech.edu.cn/

FindOR.py Usage

python FindOR.py -h[--help, None]

usage: FindOR [-h] [-o] [-p] [-e] [-l] [-v] [-V] input genome

Olfactory receptor annotation

positional arguments:
    input                 String, nhmmer output file path.
    genome                String, Genomic data file path.

optional arguments:
    -h, --help            show this help message and exit
    -o , --outputdir      String, Result save directory.(default:../output)
    -p , --prefix         String, output file prefix.(default:ORannotation)
    -e , --EvalueLimit    Float, Sequence similarity threshold. (default:1e-60)
    -l , --SeqLengthLimit 
                          Int, An artificially set OR's sequence length
                          threshold.(default:868)
    -v, --verbose         Print verbose information.
    -V, --version         Show version message and exit.

http://zhaolab.shanghaitech.edu.cn/

IdentifyFunc.py Usage

python IdentityFunc.py -h[--help, None]

usage: IdentityFunc.py [-h] [-o] [-p] [-c] [-k] [-v] [-V]
                       hitPROfile hitDNAfile

Idntity Function OR

positional arguments:
    hitPROfile         IdentityFunc.py script output file(protein sequence
                       file). hit sequence from genome
    hitDNAfile         IdentityFunc.py script output file(DNA sequence file).
                       hit sequence from genome

optional arguments:
    -h, --help         show this help message and exit
    -o , --outputdir   String, Result save directory.(default:../output)
    -p , --prefix      String, output file prefix.(default:Identity)
    -c , --cpus        number of parallel CPU workers to use. (default='2/3 of
                       all cores')
    -k , --keepfile    Bool, whether to keep intermediate file.(default:True)
    -v, --verbose      Print verbose information.
    -V, --version      Show version message and exit.

http://zhaolab.shanghaitech.edu.cn/


#### <span id='iterate'>Iteration.py Usage</span>

python Iteration.py -h[--help, None]

usage: Iteration.py [-h] [-i] [-e] [-l] [-c] [-p] [-v] [-V] profile outputdir genome

Iteration annotated a genome

positional arguments: profile String, profile nhmmer need(hmm, [un]alignment file). Notice: A group of genomes share a profile. outputdir String, output directory. genome String, genomic data file.

optional arguments: -h, --help show this help message and exit -i , --iteration Int, Number of iterations.default=2 -e , --EvalueLimit Float, Sequence similarity threshold. (default:1e-20) -l , --SeqLengthLimit Int, An artificially set OR's sequence length threshold.(default:868) -c , --cpus number of parallel (default='2/3 of all cores') -p , --prefix String, output file prefix.(default:Identity) -v, --verbose Print verbose information. -V, --version Show version message and exit.

http://zhaolab.shanghaitech.edu.cn/


#### <span id='batch'>batch.py Usage</span>

python batch.py -h[--help, None]

sage: BatchProcess.py [-h] [-e] [-l] [-c] [-v] [-V] profile inputdir nhmmerout outputdir

Batch annotated genome

positional arguments: profile String, profile nhmmer need(hmm, [un]alignment file). Notice: A group of genomes share a profile. inputdir String, genomic directory. nhmmerout String, run nhmmer program result. outputdir String, processing results directory, run FindOR.py result.

optional arguments: -h, --help show this help message and exit -e , --EvalueLimit Float, Sequence similarity threshold. (default:1e-60) -l , --SeqLengthLimit Int, An artificially set OR's sequence length threshold.(default:868) -c , --cpus number of parallel CPU workers to use. (default='2/3 of all cores') -v, --verbose Print verbose information. -V, --version Show version message and exit.

http://zhaolab.shanghaitech.edu.cn/


### Other modules
* Genome2OR/data
    * statistic_nterm.py
    * statistic_pattern_match.py
    * statistic_conserved_site.py
* Genome2OR/scripts/src
    * Functions: Functions module.
    * ParseArgs: Command line parsing module.
    * CodeMessages: Error message module.
    * config: Configuration module.
* Genome2OR/template
    * '*.hmm' file, HMM profiles
    * 'template.fasta' is template OR sequence file.You can replace it with your template file, but the first sequence(OR5AN1) cannot be change.Note that too many squences in 'template.fasta' will cause the program to run slowly, typically no more than five sequenses. 

## Example


*[Welcome to our laboratory website](http://zhaolab.shanghaitech.edu.cn/)*

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
scripts		scripts
template		template
tools		tools
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Genome2OR

Annotate Olfactory receptor CDS from genome

Contents Index

Abstract

Quick start

Download script

Usage

Main modules

nhmmer.py Usage

FindOR.py Usage

IdentifyFunc.py Usage

About

Releases

Packages

Languages

License

ToHanwei/Genome2TAAR

Folders and files

Latest commit

History

Repository files navigation

Genome2OR

Annotate Olfactory receptor CDS from genome

Contents Index

Abstract

Quick start

Download script

Usage

Main modules

nhmmer.py Usage

FindOR.py Usage

IdentifyFunc.py Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages