- This repository contains additional files for the manuscript, "Conserved novel ORFs in the mitochondrial genome of the ctenophore Beroe forskalii"
- To recreate most of the figures for the manuscript, please install
snakemake
,cuttlery
, andpauvre
, then navigate to this directory and run thesnakemake
pipeline by executing the commandsnakemake
in your terminal.
To install everything, use the following commands
# If on linux, execute the following command to ensure that everything plots correctly
sudo apt-get install dvipng
# install the requirements
pip install pauvre
pip install cuttlery
pip install snakemake
# Clone the repository and recreate the figures of the plot. Takes several hours.
https://github.com/conchoecia/beroe_forskalii_mitogenome
cd beroe_forskalii_mitogenome
snakemake --cores 4
- Pb - Pleurobrachia bachei
- Ml - Mnemiopsis leidyi
- Bf - Beroe forskalii
Files in this directory are related to determining the 16S structure of the B. forskalii mitochondrial genome. The files in this directory are:
mnemiopsis_rrnl_final.sto
is a structural Stockholm file. This encodes the M. leidyi 16S rRNA structure from Pett et al 2011.mnemi16S.cm
is theinfernal
covariance model built usingmnemiopsis_rrnl_final.sto
.Bf1311_against_mnemi16S.txt
is theinfernal
results file when the Bf1311 mitochondrial genome was searched against using themnemi16S.cm
covariance model.
This directory contains the fasta files of each B. forskalii mitochondrial genome and the ARWEN results. The files in this directory are:
MG655622.fasta
- The Bf201706 mitochondrial genome.MG655622_results.txt
- The Bf201706 ARWEN results.MG655623.fasta
- The Bf201606 mitochondrial genome.MG655623_results.txt
- The Bf201606 ARWEN results.MG655624.fasta
- The Bf201311 mitochondrial genome.MG655624_results.txt
- The Bf201311 ARWEN results.
Files in this directory pertain to ATP6 of all ctenophores. This directory contains:
README.md
contains notes about where to locate the P. bachei and M. leidyi ATP6 sequences.PB_ML_ATP6_nucl.fasta
contains the Pb and Ml ATP6 transcript DNA sequences.PB_ML_ATP6_prot.fasta
contains the Pb and Ml ATP6 protein sequences.ATP6_to_BF.txt
contains the tblastn results using the Pb and Ml ATP6 sequences to query the Bf transcriptomeBF_ATP6_hits.fasta
contains the transcript sequences of the Bf ATP6 blast hits.BF_ATP6.fasta
contains the most likely Bf ATP6 transcript based on protein sequence similarity to other ctenophore ATP6 sequences.DS12*/DS12*_mapdepthavg.txt
contains the average map depth average when the DS121 and DS122 libraries were mapped against the B. forskalii ATP6 transcript usingbwa mem
.
Text files in this directory contain the NCBI BioSample Accession numbers for all four B. forskalii ctenophore individuals.
This directory contains the file crex_results_summary.pdf
, which
is the CREx mitochondrial rearrangement analysis results for the
M. leidyi, B. forskalii, and P. bachei mitochondrial genomes.
This directory contains files used in the Fourier Transform analysis to predict which regions of the mitochondrial genome contain protein-coding DNA.
This directory contains a single file, bf_raw_mito.fa
, which is the raw mitochondrial genome assembly produced by canu.
This directory contains fasta files used in various analyses, including nucleotide and amino acid sequences, as well as various alignments. The files in this directory are
- Directory
BF201706_prot
- Directory
TM_results
- contains html file results from TMHMM for COX1, COX2, COX3, CYTB, ND1-6, URF1, and URF2.
- Directory
TM_txtfiles
- Contains text files with transmembrane domain predictions by TMHMM. There are files for COX1, COX2, COX3, CYTB, ND1-6, URF1, and URF2.
- file
Bf201706_prot.fasta
- the protein sequences from MG655622/Bf201706. These were used in generating the transmembrane domain prediction with TMHMM.
- Directory
- Directory
alignments
- Directory
concatenated_after_guidance
concatenated_prot.phy
is the COX1, COX2, COX3, CYTB, ND1, ND3, and ND5 alignments concatenated together. These are the the protein alignments that have had sites removed using Guidance2.
- Directory
concatenated_noguidance
concatenated_noguidance.phy
is the COX1, COX2, COX3, CYTB, ND1, ND3, and ND5 alignments concatenated together. No columns were removed using Guidance2.
- Directory
ctenos_all_proteins_noguidance
all_proteins_ctenos_monoallo_noguidance.phy
is the concatenated alignment for COX1, COX2, COX3, CYTB, and ND1-6 for all ctenophores and two outgroups.
- Directory
guidance_alignments
- This directory contains files and a script,run_guidance.sh
, that produces alignments with columns removed using Guidance2. - Directory
prot_cteno_aln
contains nucleotide alignments for 12S and 16S for all ctenophore mitochondrial genomes, as well as protein alignments for all ctenophores for genes COX1, COX2, COX3, CYTB, and ND1-ND6. - file
12S.fasta
- 12S alignment from Pb and other ctenophores. - file
16S.fasta
- 16S alignment from Bf and other ctenophores.
- Directory
- Directory
coding_seqs
contains all of the nucleotide sequences for Bf for COX1, COX2, COX3, CYTB, and ND1-6. - Directory
non-beroe
contains directories of nucleotide sequences for coding and noncoding regions of the following organisms: Chlamydomonas, Daphnia, Drosophila, Human, and Strongylocentrotus. - Directory
noncoding_seqs
contains all of the Bf nucleotide sequences for the noncoding regionsCOX1 to ND6
,COX3 to ND3
,ND2 to CYTB
,ND5 to URF1
,URF1 to URF2
, andURF2 to ND2
. - Directory
test_seqs
contains all of the Bf nucleotide sequences for URF1 and URF2. - file
bf_mitogenomes_alignment.fasta
- the whole-mitogenome Bf alignment used to generate the table listing indels.
When the snakemake
pipeline is run, the figures and associated text files are output to this directory.
Text files in this directory include the final DNA sequences of the
mitochondrial genomes of individuals Bf1311, Bf1706, and Bf1606. In
addition, we include the scripts map_depth_extract.sh
and
FastqPairedEndValidator.pl
used to isolate genomic reads that map to
the mitochondrial sequences.
This directory contains GFF files used in plotting mitochondrial genomes for synteny.
Contains scripts and files to analyze the number and distribution of indels between individuals.
- file
Bf_alignment.fasta
is a whole-mitogenome alignment for all three individuals of B. forskalii - file
Bf_alignment.geneious
- the same alignment, in geneious format. - file
Bforsk_indels.txt
- a table of indels, the sample in which they occur, the position, and the size. - file
print_gaps.py
- a python script that producesBforsk_indels.txt
fromBf_alignment.fasta
Text and HTML files in this directory are from the ITASSER protein structure prediction. Additionally there are structure files that can be opened with protein viewing software.
Files in this directory contain phylogenetic analyses. All subdirectories listed below are in the directory phylogeny/201904_rooted_tree
.
- Directory
RAxML_ctenos_allgenes_noguidance
- RAxML analysis conducted on COX1, COX2, COX3, CYTB, and ND1-6 using only ctenophores with two outgroups. Guidance2 was not used to remove columns from the amino acid matrix. - Directory
RAxML_protcatwag_guidance
- RAxML analysis conducted on COX1, COX2, COX3, CYTB, ND1, ND3, and ND5 using ctenophores and many outgroups. Guidance2 was used to remove columns from the amino acid matrix. - Directory
RAxML_protcatwag_noguidance
- RAxML analysis conducted on COX1, COX2, COX3, CYTB, ND1, ND3, and ND5 using ctenophores and many outgroups. Guidance2 was not used to remove columns from the amino acid matrix. - Directory
phylobayes_ctenos_allgenes_noguidance
- Phylobayes analysis conducted on COX1, COX2, COX3, CYTB, and ND1-6 using only ctenophores with two outgroups. Guidance2 was not used to remove columns from the amino acid matrix. - Directory
phylobayes_guidance
- Phylobayes analysis conducted on COX1, COX2, COX3, CYTB, ND1, ND3, and ND5 using ctenophores and many outgroups. Guidance2 was used to remove columns from the amino acid matrix. - Directory
phylobayes_noguidance
- Phylobayes analysis conducted on COX1, COX2, COX3, CYTB, ND1, ND3, and ND5 using ctenophores and many outgroups. Guidance2 was not used to remove columns from the amino acid matrix.
Contains HTML files of results from running tRNAscanSE on the whole mitochondrial genomes for Bf201706, Bf201606, and Bf201311.