This repository contains code used for HROM construction.
- 1.Trimmomatic_PE.sh Trims adapter sequences and filters out low-quality bases from Illumina sequencing reads
- 2.Bowtie2.sh Maps sequencing reads to the human reference genome and discards those aligning reads as human-derived contaminants
- 1.MEGAHIT.sh Constructs contigs/scaffolds from quality-controlled reads using MEGAHIT
- 1.metaSPAdes.sh Constructs contigs/scaffolds from quality-controlled reads using metaSPAdes
- change-contig-name-megahit.py Changes contig header of MEGAHIT for downstream analysis
- 1.Contig_align.sh Builds an index and aligns the reads to assembled contigs
- 2.metaBAT2.sh Initiates metaBAT2 binning pipeline
- 3.MaxBin2.sh Initiates MaxBin2 binning pipeline
- 4.CONCOCT.sh Initiates CONCOCT binning pipeline
- merge_cutup_clustering.py Combines subcontig-level clustering results back into the original contig-level clustering for CONCOCT
- 5.MetaWRAP.sh Run MetaWRAP bin refinement pipeline
- define_quality.py Summarize genome quality report of MetaWRAP
- 6.GUNC.sh Run GUNC chimerism detection pipeline
- GTDB-tk2_classify.sh Classify genomes with GTDB-Tk2
- barrnap_tRNAscan-SE.sh Initiates barrnap and tRNAscan-SE2
- checkM-CPR.sh Genome quality estimation for Patescibacteria
- checkM-taxonomy_wf.sh Genome quality estimation using checkM and universal bacterial marker gene set
- summarize_checkm-result.py Summarize checkM Result
- summarize-barrnap.py Summarize barrnap Result
- summarize-tRNAscan.py Summarize tRNAscan Result
- NUCMER.sh Initiates nucmer for pairwise coverage & ANI calculation
- cal_nucmer.py Calculates pairwise alignment coverage & ANI from nucmer output
- dRep-MASH.sh Initiates MASH clustering using dRep compare module
- *hierarchical_clustering.R Species-level hierarchical clustering using alignment coverage & ANI
- summarize_cluster.py summarize species-level clustering and set representative genome
- Prokka.sh Initiates Prokka for Protein sequence estimation
- MMseqs2-linclust.sh Clustering all HROM proteins at a specified percentage identity using MMseqs2
- eggNOG-mapper.sh Funtional annotation of protein catalog using eggNOG mapper
- Anvio-kofam-metabolism.sh Anvi'o pipeline to estimate metabolism module completeness
- Anvi-metabolic-independence.py Estimate metabolic independence score from metabolism module completeness result from Anvi'o
- 33_Module.list List of predefined 33 metabolism module for estimation of metabolic independence score
- DefenseFinder.sh Run DefenseFinder for viral defense system estimation
- MacSyFinder.sh Run MacSyFinder for Type IV pilus system gene annotation
- Panaroo.sh Run Panaroo for species-level pangenome construction
- QIIME2-UMAP.sh UMAP ordination of Patescibacteria genomes based on Jaccard distance derived from protein presence/absence profiles
- RGI.sh Run RGI for Antibiotic resistance gene annotation
- Sporulation-tblastn.sh Run tblastn to 65 marker genes related to Sporulation score
- Sporulation-summarize.py Summarize sporulation gene hits from tblastn result
- gutSMASH.sh Run gutSMASH for metabolic gene cluster annotation
- pyrodigal.sh Run pyrodigal for small peptide estimation
- filter-pyrodigal.py Retain small peptides that fall within the specified length threshold
- hmmAntiFam.sh Run hmmsearch using Antifam profiles on small peptides to remove spurious proteins
- summarize-Antifam.py Summarize Antifam result for subsequent removal
- Macrel.sh Run Macrel for AMP estimation from non-spurious small peptides
- summarize-Clustering-MMseqs2.py Summarize clustering of genes using MMseqs2 linclust
- apply-filter-Clustering-MMseqs2.py Apply initial filter on markers based on coreness & uniqness
- generate_150nt_from_genes.py Generate short reads from filtered marker genes
- align-fragment.sh Align fragemented short reads of marker genes to genomes using Bowtie2
- summarise-coreness-from-aln.py Summarize Bowtie2 result and estimate corness
- CAMI.sh Run CAMISIM for simulation dataset generation
- PROFILING.sh Run Kraken2 and Bracken with the Representative and Concatenated databases, and MetaPhlAn4 with the HROM-marker database on simulation dataset
- bray_curtis_and_F1_True.py Estimate Bray-curtis similarity, Precision, Recall, F1-score from simulation dataset
- bracken_result.sh Summarize bracken profiling result for genome size normalization
- normalize-genomesize.py Genome size normalization of profiling result from Kraken2 & Bracken
- summarize-mph4.py Summarize MetaPhlAn4 & HROM marker gene profiling result for evaluation