-
Notifications
You must be signed in to change notification settings - Fork 39
The assemble subcommand
Guanliang MENG edited this page Jun 22, 2023
·
1 revision
You can use this subcommand to assemble mitogenomes (not-yet-annotated!) from your fastq files.
$ mitoz assemble -h
usage: mitoz assemble [-h] [--workdir <STR>] --outprefix <STR> [--thread_number <INT>] --fq1 <file>
[--fq2 <file>] [--insert_size <INT>] [--fastq_read_length <INT>]
[--assembler {mitoassemble,spades,megahit}] [--tmp_dir <STR>] [--kmers <INT> [<INT> ...]]
[--kmers_megahit <INT> [<INT> ...]] [--kmers_spades <INT> [<INT> ...]] [--memory <INT>]
[--resume_assembly] [--profiles_dir <STR>] [--slow_search] [--filter_by_taxa]
--requiring_taxa <STR> [--requiring_relax {0,1,2,3,4,5,6}] [--min_abundance <float>]
[--abundance_pattern <STR>] [--genetic_code <INT>]
[--clade {Chordata,Arthropoda,Echinodermata,Annelida-segmented-worms,Bryozoa,Mollusca,Nematoda,Nemertea-ribbon-worms,Porifera-sponges}]
Mitochondrial genome assembly from input fastq files.
optional arguments:
-h, --help show this help message and exit
Common arguments:
--workdir <STR> workdir [./]
--outprefix <STR> output prefix
--thread_number <INT>
thread number. Caution: For spades, --thread_number 32 can take 150 GB RAM! Setting this
to 8 to 16 is typically good. [8]
Input fastq files:
--fq1 <file> fastq 1 file. Set only this option but not --fastq2 means SE data. [required]
--fq2 <file> fastq 2 file (optional for mitoassemble and megahit, required for spades)
--insert_size <INT> insert size of input fastq files [250]
--fastq_read_length <INT>
read length of fastq reads, used by mitoAssemble. [150]
Assembly arguments:
--assembler {mitoassemble,spades,megahit}
Assembler to be used. [megahit]
--tmp_dir <STR> Set temp directory for megahit if necessary (See
https://github.com/linzhi2013/MitoZ/issues/176)
--kmers <INT> [<INT> ...]
kmer size(s) to be used. Multiple kmers can be used, separated by space. Only for
mitoassemble [71]
--kmers_megahit <INT> [<INT> ...]
kmer size(s) to be used. Multiple kmers can be used, separated by space. Only for
megahit [21 29 39 59 79 99 119 141]
--kmers_spades <INT> [<INT> ...]
kmer size(s) to be used. Multiple kmers can be used, separated by space. Only for spades
['auto']
--memory <INT> memory size limit for spades/megahit, no enough memory will make the two programs halt
or exit [50]
--resume_assembly to resume previous assembly running [False]
Search mitochondrial sequences arguments:
--profiles_dir <STR> Directory cotaining 'CDS_HMM/', 'MT_database/' and 'rRNA_CM/'.
[/home/gmeng/.conda/envs/mybase/envs/mitozEnv.test3.6/lib/python3.8/site-
packages/mitoz/profiles]
--slow_search By default, we firstly use tiara to perform quick sequence classification (100 times
faster than usual!), however, it is valid only when your mitochondrial sequences are >=
3000 bp. If you have missing genes, set '--slow_search' to use the tradicitiona search
mode. [False]
--filter_by_taxa filter out non-requiring_taxa sequences by mito-PCGs annotation to do taxa
assignment.[True]
--requiring_taxa <STR>
filtering out non-requiring taxa sequences which may be contamination [required]
--requiring_relax {0,1,2,3,4,5,6}
The relaxing threshold for filtering non-target-requiring_taxa. The larger digital means
more relaxing. [0]
--min_abundance <float>
the minimum abundance of sequence required. Set this to any value <= 0 if you do NOT
want to filter sequences by abundance [10]
--abundance_pattern <STR>
the regular expression pattern to capture the abundance information in the header of
sequence ['abun\=([0-9]+\.*[0-9]*)']
--genetic_code <INT> which genetic code table to use? 'auto' means determined by '--clade' option. [auto]
--clade {Chordata,Arthropoda,Echinodermata,Annelida-segmented-worms,Bryozoa,Mollusca,Nematoda,Nemertea-ribbon-worms,Porifera-sponges}
which clade does your species belong to? [Arthropoda]
Single-end (SE)data: set --fq1
only.
Paired-end (PE) data: set both --fq1
and --fq2
.
Example:
mitoz assemble \
--fq1 test.1.fq.gz \
--fq2 test.2.fq.gz \
--assembler megahit \
--requiring_taxa Arthropoda \
--genetic_code 5 \
--clade Arthropoda
About:
Commands:
- The -all- subcommand
- The -filter- subcommand
- The -assemble- subcommand
- The -findmitoscaf- subcommand
- The -annotate- subcommand
- The -visualize- subcommand
Usages:
- Installation
- Tutorial
- Extending MitoZ-s database
- Batch processing of many samples
- Known issues
- FAQ
- Some important intermediate files
- Upload to GenBank
MitoZ-tools:
- Overview: The -mitoz tools- command
- The -mitoz-tools--group_seq_by_gene- command
- The -mitoz tools bold_identification- command
- The -mitoz tools circle_check- command
- The -mitoz tools gbfiletool- command
- The -mitoz tools gbseqextractor- command
- The -mitoz tools msaconverter- command
- The -mitoz tools taxonomy_ranks- command