This is a clean version of CORASON integrated with the FUNGIT repository. It is tailored for fungal genomes. It wont drop large jobs on thousands of PKSs and NRPSs and in general for long sequences
chmod +x bin/nw_distance
sudo apt install ncbi-blast+
sudo apt install iqtree
sudo apt install mafft
sudo apt install augustus
Make sure you have uninstalled old docker versions: https://docs.docker.com/engine/install/ubuntu/#uninstall-docker-engine Then install docker: https://docs.docker.com/engine/install/ubuntu/ To make docker run without sudo: https://docs.docker.com/engine/install/linux-postinstall/
Then install the image: mkdir ~/bin # not required if you already have that curl -q https://dl.secondarymetabolites.org/releases/latest/docker-run_antismash-full > ~/bin/run_antismash chmod a+x ~/bin/run_antismash
run_antismash . . --version
sudo apt install cpanminus
cpanm SVG
perl -MCPAN -e shell
install SVG
tar -xvf EXAMPLE_DATASET.tar.gz
mv EXAMPLE_DATASET/GENOMES bin/
mv EXAMPLE_DATASET/GENOMES.IDs bin/
mv EXAMPLE_DATASET/G3P1.query .
rm -r EXAMPLE_DATASET
perl fungison.pl -d full -x FORMATDB -q 1 -r 1 -e 0.000001 -s 400 -f 5 -q G3P1.query
USAGE: perl fungison.pl <OPTIONS>
OPTIONS:
-q FILE.query |QUERY FILE, [a file with .query extension]
-r 1234 |REFERENCE GENOME ID FROM GENOMES.IDs, WHEN NOT USING -d full MAKE SURE THE ENTRY IS LISTED IN -d [a number]
-e 0.0000001 |E-VALUE CUTOFF, [a number]
-s 200 |BIT-SCORE CUTOFF [a number]
-f 10 |NUMBER OF FLAKING GENES INCLUDED IN THE ANALYSIS, [a number]
-d full OR -db 1,2,3 |IDs OF THE GENOMES INCLUDED IN THE ANALYSIS, [full= entire database OR selected genomes separated by ',' ]
-x n or -F FORMATDB |FORMAT THE DATABASE SELECTED WITH THE -d OPTION, ['no' is the recommeded option or 'FORMATDB']
The database will be created with full entries
All arguments were provided
Running CORASON2 with query G3P1.query and reference gene context from entry ID:1, Agaricus_bisporus_AB58GCA_008271545.1
The e-value cutoff is 0.000001 and the bitscore cut-off is 400
You are searching in the full database with 3 entries, the search will be done for 5 genes flanking the query hits
##########################################################################
CORASON: CORE ANALYSIS OF SYNTENIC ORTHOLOGOUS NATURAL PRODUCT BGCs
this is version 2.1 uses iqtree, no trimming, phylo sort with nw_distances
Smartmatch silenced, order files exist if trees fail, aligning with mafft
Adapted to run with wrapper fungison.pl
latest version modified by Pablo Cruz-Morales september 2022
##########################################################################
Your working directory is /home/ynp/Desktop/FUNGISON/bin
Program lists have been created
An organism identifier has been added to each sequence
Formatting the database...
the database has been formatted
Searching for homologs of the query...
Sequences found
Searching for clusters related to the query...
Clusters found
There are 5 organisms with similar clusters
Aligning the sequences...
Creating a tree of query homologs (single marker)...
Calculating the BGC core...
The core has been calculated
core line number 4 Core!
core line number ¡4!
Best cluster 1_5356
Aligning...
Sequences were aligned
Creating matrix...
renaming...
Formating matrix for BGC tree..
constructing the BGC tree using IQTREE with 1000 bootstraps replicates...
Drawing the genome contexts with the order of the BGC tree...
BGC_TREE.orderDrawing the BGCs with files 1_5356.input,2_524.input,2_525.input,3_6762.input,3_6763.input :
rm: cannot remove 'Cluster*.BLAST': No such file or directory
rm: cannot remove 'QUERY_HITS.aln.uniqueseq.phy': No such file or directory
BGCs found in the following genome IDs:
1,2,3,
There are 5 organisms with similar clusters
There is a core composed by 4 orhtolog(s) in this BGC
The core is annotated in the reference organism as follows:
fig|666666.797168.peg.5354 unknown function
fig|666666.797168.peg.5358 unknown function
fig|666666.797168.peg.5359 Amino acid permease
fig|666666.797168.peg.5360 unknown function
Done
Have a great day
cd G3P1_results/
ls
G3P1.BLAST G3P1.core.aln G3P1.core.contree G3P1.gene_context.svg G3P1.hits.aln G3P1.hits.contree G3P1.report
Cheking the G3P1.core.contree (you can open it in a web browser) made with 4synthenic orthologs found in 5 genomes
here i opened the file with figtree (http://tree.bio.ed.ac.uk/software/figtree/)
sh download_fungal_genomes.sh 'Aspergillus nidulans'
augustus --species=aspergillus_nidulans mygenome.fna --gff3=on --stopCodonExcludedFromCDS=off > mygenome.gff
Can use 'nohup ./augustus_on_all_genomes.sh > "nohup.out" $' To run augustus on all genomes in genomes_raw
run_antismash genome.fna output_folder --genefinding-gff3 /input/genome.gff --taxon fungi --fullhmmer --cc-mibig --cb-knownclusters
Can use 'nohup ./antismash_on_all_genomes.sh > "nohup.out" $' To run antismash on all genomes in genomes_raw. The output will be in antismash_output
screen -L -Logfile logs/screen_log.txt -dmS parallel_processing "./parallel_process.sh"
A test showed that antismash for the same genome took: 5140 seconds using 1 core(s) 4116 seconds using 2 core(s) 3673 seconds using 4 core(s)
find . -type f -name '*.gbk' and to copy them find . -type f -name '*genomic.gbk' -exec cp {} genomes ;
Why does it say /input/genome.gff ?? - There's no folder named "Input" antismash/antismash#176 (comment)
https://dl.secondarymetabolites.org/releases/latest/antismash-7.1.0.tar.gz