Annotation of the genome - Hcv1a1d20200411
conchoecia
released this
12 Apr 17:40
·
35 commits
to master
since this release
The genome file is not included in this release .tar.gz
. Download the genome file here: UCSC_Hcal_v1.fa.gz
This release contains annotation and protein files for the Hcalv1 genome. Most likely you will use files:
Hcv1a1d20200411_release/Hcv1a1d20200411_model_proteins.pep.gz
- The model proteins for each transcript. NB - not all transcripts had CDS.
Hcv1a1d20200411_release/Hcv1a1d20200411_transcripts.fasta.gz
- Transcript files generated directly from the genome. May contain prematurely truncated CDS.
Hcv1a1d20200411_release/Hcv1a1d20200411.gff.gz
- Genome annotation of transcripts.
Hcv1a1d20200411_release/protein_size_table_Hcv1a1d20200411.csv
- A table showing the protein size differences in the within-transcript-phased transcript haplotypes, as well as which was selected for the model proteins.
Hcv1a1d20200411_release/partly_phased/
Hcv1a1d20200411_release/partly_phased/h1_pilon_Hcv1a1d20200411.fasta.gz
- Pseudohapltype h1 of within-transcript-phased transcripts. Each transcript is derived from a single haplotype, but it is not phased with respect to all other transcripts in the genome.
Hcv1a1d20200411_release/partly_phased/h1_Hcv1a1d20200411.pep.gz
- Putative proteins from the above fasta file.
Hcv1a1d20200411_release/partly_phased/h2_pilon_Hcv1a1d20200411.fasta.gz
- Pseudohaplotype h2 of the within-transcript-phased transcripts
Hcv1a1d20200411_release/partly_phased/h2_Hcv1a1d20200411.pep.gz
- Putative proteins from the above fasta file.
Hcv1a1d20200411_release/phased/
Hcv1a1d20200411_release/phased/Hcv1a1d20200411_h1_phased_nucl.fasta.gz
- Transcripts that are from h1. Matches the whole-genome phased vcf file.
Hcv1a1d20200411_release/phased/Hcv1a1d20200411_h1_phased_protein.pep.gz
- Proteins from the above file.
Hcv1a1d20200411_release/phased/Hcv1a1d20200411_h2_phased_nucl.fasta.gz
- Transcripts that are from h2. Matches the whole-genome phased vcf file.
Hcv1a1d20200411_release/phased/Hcv1a1d20200411_h2_phased_protein.pep.gz
- Proteins from the above file.
Hcv1a1d20200411_release/phased/transcripts_unique_to_h1.Hcv1a1d20200411.list
- Transcripts that were able to be assigned to haplotype 1 (h1) of the whole-genome phasing. You probably won't need this file.
Hcv1a1d20200411_release/phased/transcripts_unique_to_h2.Hcv1a1d20200411.list
- Same as above, but to h2. You probably won't need this file.
Hcv1a1d20200411_release/phased/transcripts_shared_by_both_should_be_empty.Hcv1a1d20200411.list
- Intermediate check that no transcripts are shared by both haplotypes. Should be empty.
Hcv1a1d20200411_release/phased/second_list_of_transcripts_shared_by_both_should_be_empty.Hcv1a1d20200411.list
- Final check that no transcripts are shared by both haplotypes. Should be empty.