Skip to content

Sub Sequence Query Performance

Hüseyin Tuğrul BÜYÜKIŞIK edited this page Feb 27, 2021 · 1 revision

To test performance of sub-sequence query, a file with only 4 sequences is used (that is size of 4.9MB yersinia_pestis_genomic.fna).

For FX8150 CPU (3.6GHz), minimum query latency is about 1 microsecond on average. 1000-length sub-sequence query bandwidth with multi-threaded access is 300-400 MB/s.

init: 89544565 nanoseconds    
sub-sequence query (sequential-access latency benchmark, single thread): 5603564809 nanoseconds     
(throughput = 1160.19 nanoseconds per iteration) 

sub-sequence query (random-access latency benchmark, single thread): 5432723756 nanoseconds     
(throughput = 1167.39 nanoseconds per iteration) 

sub-sequence query (random-access latency benchmark, single thread): 81381852 nanoseconds     
(throughput = 1157.55 nanoseconds per iteration) 

sub-sequence query (random-access latency benchmark, single thread): 110422177 nanoseconds     
(throughput = 1147.72 nanoseconds per iteration) 

sub-sequence query (random-access latency benchmark, single thread): 10972431 nanoseconds     
(throughput = 1141.53 nanoseconds per iteration) 

bandwidth benchmark, sequential access, multi-threaded: 16096582 nanoseconds     
(bandwidth = 289.11 MB/s)      (throughput = 3459.40 nanoseconds per iteration) 

bandwidth benchmark, sequential access, multi-threaded: 166062 nanoseconds     
(bandwidth = 423.37 MB/s)      (throughput = 2372.31 nanoseconds per iteration) 

bandwidth benchmark, sequential access, multi-threaded: 346675 nanoseconds     
(bandwidth = 277.52 MB/s)      (throughput = 3611.20 nanoseconds per iteration) 

bandwidth benchmark, sequential access, multi-threaded: 62097 nanoseconds     
(bandwidth = 154.79 MB/s)      (throughput = 6899.67 nanoseconds per iteration) 

bandwidth benchmark, random access, multi-threaded: 15461473 nanoseconds     
(bandwidth = 300.99 MB/s)      (throughput = 3322.90 nanoseconds per iteration) 

bandwidth benchmark, random access, multi-threaded: 157730 nanoseconds     
(bandwidth = 445.73 MB/s)      (throughput = 2253.29 nanoseconds per iteration) 

bandwidth benchmark, random access, multi-threaded: 209482 nanoseconds     
(bandwidth = 459.28 MB/s)      (throughput = 2182.10 nanoseconds per iteration) 

bandwidth benchmark, random access, multi-threaded: 47639 nanoseconds     
(bandwidth = 201.77 MB/s)      (throughput = 5293.22 nanoseconds per iteration) 

Source:

#include "FastaGeneIndexer.h"
#include "CpuBenchmarker.h"
#include <random>
int main(int argC, char** argV)
{
    try
    {
        bool debug = true;
        FastaGeneIndexer cache("./data/yersinia_pestis_genomic.fna", debug);

        size_t n = cache.n();
        size_t byteCount = 0;
        {
        	CpuBenchmarker bench(0,"init");
        	for(int i=0;i<n;i++)
        	{
        		std::string str = cache.getSequence(i);
        		byteCount += str.length();

        	}
        }

        {
        	CpuBenchmarker bench(0,"sub-sequence query (sequential-access latency benchmark, single thread)",byteCount);
        	for(int i=0;i<n;i++)
        	{
        		std::string str = cache.getSequence(i);
        		const size_t sn = str.length();


        		for(int j=0;j<sn;j++)
        		{
        			auto str1 = cache.getSequence(i,j,1);
        		}
        	}
        }

        {
        	for(int i=0;i<n;i++)
        	{
        		std::string str = cache.getSequence(i);
        		const size_t sn = str.length();
        		std::random_device rd;
        		std::mt19937 rng(rd());
        		std::uniform_real_distribution<float> rnd(0,sn-1);

        		CpuBenchmarker bench(0,"sub-sequence query (random-access latency benchmark, single thread)",sn);
        		for(int j=0;j<sn;j++)
        		{
        			int jj = rnd(rng)+0.1f;
        			std::string str1 = cache.getSequence(i,jj,1);
        		}
        	}
        }


        {
        	for(int i=0;i<n;i++)
        	{
        		std::string str = cache.getSequence(i);
        		size_t len = str.length();

        		{
                	CpuBenchmarker bench(len,"bandwidth benchmark, sequential access, multi-threaded",(len/1000));
					#pragma omp parallel for
					for(int j=0;j<len;j+=1000)
					{
						int ln = ((j+1000 < len)?1000:(len-j));

						std::string str1 = cache.getSequence(i,j,ln);
					}
        		}
        	}
        }


        {

        	auto f = [](const int & min, const int & max) {
        	    static thread_local std::mt19937 generator;
        	    std::uniform_int_distribution<int> distribution(min,max);
        	    return distribution(generator);
        	};

        	for(int i=0;i<n;i++)
        	{
        		std::string str = cache.getSequence(i);
        		size_t len = str.length();

        		{
                	CpuBenchmarker bench(len,"bandwidth benchmark, random access, multi-threaded",(len/1000));
					#pragma omp parallel for
					for(int j=0;j<len;j+=1000)
					{
						int jj = f(0,len-1000);
						int ln = ((jj+1000 < len)?1000:(len-jj));

						std::string str1 = cache.getSequence(i,jj,ln);
					}
        		}
        	}
        }
    }
    catch(std::exception & e)
    {
        std::cout<< e.what() <<std::endl;
    }
    return 0;
}
Clone this wiki locally