Skip to content

Sequence Query

Hüseyin Tuğrul BÜYÜKIŞIK edited this page Feb 27, 2021 · 4 revisions

Sequence queries are made with method getSequence. It takes a single parameter as index of sequence. 0 means first sequence, N-1 means last sequence in N-sequence FASTA file.

bool debug = true;
FastaGeneIndexer cache("./data/influenza.fna", debug);

std::cout << cache.getDescriptor(100) << std::endl;
std::cout << cache.getSequence(100) << std::endl;
gi|221327|gb|D13578|Influenza A virus (A/Kaizuka/2/65(H2N2)) gene for hemagglutinin, partial cds
AATACAACACTACCTTTTCACAATGTCCACCCACTGACAATAGGTGAATGCCCCAAATATGTAAAATCGGAGAAATTGGTCTTAGCAACAGGACTAAGGAATGTTCCCCAGATTGAATCAAGAGGATTGTTTGGGGCAATAGCTGGCTTTATAGAAGGAGGATGGCAAGGAATGGTTGATGGTTGGTATGGATACCATCACAGCAATGACCAGGGATCAGGGTATGCAGCAGACAAAGAATCCACTCAAAAGGCATTTGATGGAATCACCAACAAGGTAAATTCTGTGATTGAAAAGATGAACACCCAATTTGAAGCTGTTGGGAAAGAATTCAATAATTTAGAGAAAAGACTGGAGAACTTGAACAAAAAGATGGAAGACGGGTTTCTAGATG

if initDescriptorIndexMapping method is called, then sequences can be accessed by their descriptors:

bool debug = true;
FastaGeneIndexer cache("./data/influenza.fna", debug);
cache.initDescriptorIndexMapping();
std::cout << cache.getDescriptor(1) << std::endl;
std::cout << cache.getSequenceByDescriptor(cache.getDescriptor(1)) << std::endl;
std::cout << cache.getSequence(1) << std::endl;

It doesn't skip duplicate descriptors and last duplicate's index is used instead:

gi|59292|gb|X53029|Influenza A virus (A/USSR/90/1977(H1N1)) genes for matrix proteins 1 and 2, genomic RNA
AGCAAAAGCAGGTAGATGTTGAAAGATGAGTCTTCTAACCGAGGTCGAAACGTACGTTCTCTCTATCGTCCCGTCAGGCCCCCTCAAAGCCGAGATCGCACAGAGACTTGAAGATGTCTTTGCTGGGAAGAACACCGATCTTGAGGCTCTCATGGAATGGCTAAAGACAAGACCAATCCTGTCACCTCTGACTAAGGGGATTTTAGGATTTGTGTTCACGCTCACCGTGCCCAGTGAGCGAGGACTGCAGCGTAGACGCTTTGTCCAAAATGCCCTTAATGGGAATGGGGATCCAAATAACATGGACAGAGCAGTTAAACTGTATAGAAAGCTTAAGAGGGAGATAACATTCCATGGGGCCAAAGAAATAGCACTCAGTTATTCTGCTGGTGCACTTGCCAGTTGTATGGGCCTCATATACAACAGGATGGGGGCTGTGACCACCGAAGCGGCATTTGGCCTGATATGCGCAACCTGTGAACAGATTGCTGACTCCCAGCATAGGTCTCATAGGCAAATGGTGACAACAACCAATCCACTAATAAGACATGAGAACAGAATGGTTCTGGCCAGCACTACAGCTAAGGCTATGGAGCAAATGGCTGGATCGAGTGAGCAAGCAGCAGAGGCCATGGAGGTTGCTAGTCAGGCCAGGCAAATGGTGCAGGCAATGAGAGCCATTGGGACTCATCCTAGCTCCAGTGCTGGTCTGAAAAATGATCTTCTTGAAAATTTGCAGGCCTATCAGAAACGAATGGGGGTGCAGATGCAACGATTCAAGTGATCCTCTTGTTGTTGCCGCAAGTATCATTGGGATTTTGCACTTGATATTGTGGATTCTTGATCGTCTTTTTTTCAAATGCATTTATCGTCTCTTTAAACACGGTCTGAAAAGAGGGCCTTCTACGGAAGGAGTACCAGAGTCTATGAGGGAAGAATATCGAAAGGAACAGCAGAATGCTGTGGATGCTGACGATAGTCATTTTGTCAACATAGAGCTAGAGTAAAAAACTACCTTGTTTCTACT
AGCAAAAGCAGGTAGATGTTGAAAGATGAGTCTTCTAACCGAGGTCGAAACGTACGTTCTCTCTATCGTCCCGTCAGGCCCCCTCAAAGCCGAGATCGCACAGAGACTTGAAGATGTCTTTGCTGGGAAGAACACCGATCTTGAGGCTCTCATGGAATGGCTAAAGACAAGACCAATCCTGTCACCTCTGACTAAGGGGATTTTAGGATTTGTGTTCACGCTCACCGTGCCCAGTGAGCGAGGACTGCAGCGTAGACGCTTTGTCCAAAATGCCCTTAATGGGAATGGGGATCCAAATAACATGGACAGAGCAGTTAAACTGTATAGAAAGCTTAAGAGGGAGATAACATTCCATGGGGCCAAAGAAATAGCACTCAGTTATTCTGCTGGTGCACTTGCCAGTTGTATGGGCCTCATATACAACAGGATGGGGGCTGTGACCACCGAAGCGGCATTTGGCCTGATATGCGCAACCTGTGAACAGATTGCTGACTCCCAGCATAGGTCTCATAGGCAAATGGTGACAACAACCAATCCACTAATAAGACATGAGAACAGAATGGTTCTGGCCAGCACTACAGCTAAGGCTATGGAGCAAATGGCTGGATCGAGTGAGCAAGCAGCAGAGGCCATGGAGGTTGCTAGTCAGGCCAGGCAAATGGTGCAGGCAATGAGAGCCATTGGGACTCATCCTAGCTCCAGTGCTGGTCTGAAAAATGATCTTCTTGAAAATTTGCAGGCCTATCAGAAACGAATGGGGGTGCAGATGCAACGATTCAAGTGATCCTCTTGTTGTTGCCGCAAGTATCATTGGGATTTTGCACTTGATATTGTGGATTCTTGATCGTCTTTTTTTCAAATGCATTTATCGTCTCTTTAAACACGGTCTGAAAAGAGGGCCTTCTACGGAAGGAGTACCAGAGTCTATGAGGGAAGAATATCGAAAGGAACAGCAGAATGCTGTGGATGCTGACGATAGTCATTTTGTCAACATAGAGCTAGAGTAAAAAACTACCTTGTTTCTACT
Clone this wiki locally