Sub Sequence Query Performance

To test performance of sub-sequence query, a file with only 4 sequences is used (that is size of 4.9MB yersinia_pestis_genomic.fna).

For FX8150 CPU (3.6GHz), minimum query latency is about 1 microsecond on average. 1000-length sub-sequence query bandwidth with multi-threaded access is 300-400 MB/s.

init: 89544565 nanoseconds    
sub-sequence query (sequential-access latency benchmark, single thread): 5603564809 nanoseconds     
(throughput = 1160.19 nanoseconds per iteration) 

sub-sequence query (random-access latency benchmark, single thread): 5432723756 nanoseconds     
(throughput = 1167.39 nanoseconds per iteration) 

sub-sequence query (random-access latency benchmark, single thread): 81381852 nanoseconds     
(throughput = 1157.55 nanoseconds per iteration) 

sub-sequence query (random-access latency benchmark, single thread): 110422177 nanoseconds     
(throughput = 1147.72 nanoseconds per iteration) 

sub-sequence query (random-access latency benchmark, single thread): 10972431 nanoseconds     
(throughput = 1141.53 nanoseconds per iteration) 

bandwidth benchmark, sequential access, multi-threaded: 16096582 nanoseconds     
(bandwidth = 289.11 MB/s)      (throughput = 3459.40 nanoseconds per iteration) 

bandwidth benchmark, sequential access, multi-threaded: 166062 nanoseconds     
(bandwidth = 423.37 MB/s)      (throughput = 2372.31 nanoseconds per iteration) 

bandwidth benchmark, sequential access, multi-threaded: 346675 nanoseconds     
(bandwidth = 277.52 MB/s)      (throughput = 3611.20 nanoseconds per iteration) 

bandwidth benchmark, sequential access, multi-threaded: 62097 nanoseconds     
(bandwidth = 154.79 MB/s)      (throughput = 6899.67 nanoseconds per iteration) 

bandwidth benchmark, random access, multi-threaded: 15461473 nanoseconds     
(bandwidth = 300.99 MB/s)      (throughput = 3322.90 nanoseconds per iteration) 

bandwidth benchmark, random access, multi-threaded: 157730 nanoseconds     
(bandwidth = 445.73 MB/s)      (throughput = 2253.29 nanoseconds per iteration) 

bandwidth benchmark, random access, multi-threaded: 209482 nanoseconds     
(bandwidth = 459.28 MB/s)      (throughput = 2182.10 nanoseconds per iteration) 

bandwidth benchmark, random access, multi-threaded: 47639 nanoseconds     
(bandwidth = 201.77 MB/s)      (throughput = 5293.22 nanoseconds per iteration)

Source:

#include "FastaGeneIndexer.h"
#include "CpuBenchmarker.h"
#include <random>
int main(int argC, char** argV)
{
    try
    {
        bool debug = true;
        FastaGeneIndexer cache("./data/yersinia_pestis_genomic.fna", debug);

        size_t n = cache.n();
        size_t byteCount = 0;
        {
        	CpuBenchmarker bench(0,"init");
        	for(int i=0;i<n;i++)
        	{
        		std::string str = cache.getSequence(i);
        		byteCount += str.length();

        	}
        }

        {
        	CpuBenchmarker bench(0,"sub-sequence query (sequential-access latency benchmark, single thread)",byteCount);
        	for(int i=0;i<n;i++)
        	{
        		std::string str = cache.getSequence(i);
        		const size_t sn = str.length();


        		for(int j=0;j<sn;j++)
        		{
        			auto str1 = cache.getSequence(i,j,1);
        		}
        	}
        }

        {
        	for(int i=0;i<n;i++)
        	{
        		std::string str = cache.getSequence(i);
        		const size_t sn = str.length();
        		std::random_device rd;
        		std::mt19937 rng(rd());
        		std::uniform_real_distribution<float> rnd(0,sn-1);

        		CpuBenchmarker bench(0,"sub-sequence query (random-access latency benchmark, single thread)",sn);
        		for(int j=0;j<sn;j++)
        		{
        			int jj = rnd(rng)+0.1f;
        			std::string str1 = cache.getSequence(i,jj,1);
        		}
        	}
        }


        {
        	for(int i=0;i<n;i++)
        	{
        		std::string str = cache.getSequence(i);
        		size_t len = str.length();

        		{
                	CpuBenchmarker bench(len,"bandwidth benchmark, sequential access, multi-threaded",(len/1000));
					#pragma omp parallel for
					for(int j=0;j<len;j+=1000)
					{
						int ln = ((j+1000 < len)?1000:(len-j));

						std::string str1 = cache.getSequence(i,j,ln);
					}
        		}
        	}
        }


        {

        	auto f = [](const int & min, const int & max) {
        	    static thread_local std::mt19937 generator;
        	    std::uniform_int_distribution<int> distribution(min,max);
        	    return distribution(generator);
        	};

        	for(int i=0;i<n;i++)
        	{
        		std::string str = cache.getSequence(i);
        		size_t len = str.length();

        		{
                	CpuBenchmarker bench(len,"bandwidth benchmark, random access, multi-threaded",(len/1000));
					#pragma omp parallel for
					for(int j=0;j<len;j+=1000)
					{
						int jj = f(0,len-1000);
						int ln = ((jj+1000 < len)?1000:(len-jj));

						std::string str1 = cache.getSequence(i,jj,ln);
					}
        		}
        	}
        }
    }
    catch(std::exception & e)
    {
        std::cout<< e.what() <<std::endl;
    }
    return 0;
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sub Sequence Query Performance

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally