-
Notifications
You must be signed in to change notification settings - Fork 1
Home
Andreas Sjödin edited this page Nov 8, 2023
·
3 revisions
RepGenR stands for Representative-genome Repositories. It's a robust toolkit designed for microbial genomics researchers to create de-replicated genome sets. These sets are curated from user-selected taxa within the GTDB (Genome Taxonomy DataBase), making large-scale genomic analyses more manageable and meaningful.
RepGenR streamlines the process of managing genomic data through several functional modules, each designed to handle specific tasks in the data curation and analysis workflow:
- Taxa Selection: Choose your taxa of interest directly from GTDB to focus your genomic research.
-
Genome Downloading and Formatting: Automatically download the selected genome sequences and rename them using a standardized format:
<family>_<genus>_<species>_<accession_number>.fasta
. This ensures clarity and consistency in your dataset. - Genome De-replication: Simplify your dataset by de-replicating the downloaded genomes based on their Average Nucleotide Identity (ANI), retaining only representative sequences for efficiency.
- Phylogenetic Analysis: Compute the phylogenetic relationships among your chosen genomes, whether you're working with a dereplicated set or the complete sequence pool.
- Phylogenetic Tree Output: Export the phylogenetic tree as a parent-child relations file, which is readily compatible with downstream tools like FlexTaxD for further taxonomic and evolutionary studies.