-
Notifications
You must be signed in to change notification settings - Fork 0
Sub Sequence Query Performance
Hüseyin Tuğrul BÜYÜKIŞIK edited this page Feb 27, 2021
·
1 revision
To test performance of sub-sequence query, a file with only 4 sequences is used (that is size of 4.9MB yersinia_pestis_genomic.fna
).
For FX8150 CPU (3.6GHz), minimum query latency is about 1 microsecond on average. 1000-length sub-sequence query bandwidth with multi-threaded access is 300-400 MB/s.
init: 89544565 nanoseconds
sub-sequence query (sequential-access latency benchmark, single thread): 5603564809 nanoseconds
(throughput = 1160.19 nanoseconds per iteration)
sub-sequence query (random-access latency benchmark, single thread): 5432723756 nanoseconds
(throughput = 1167.39 nanoseconds per iteration)
sub-sequence query (random-access latency benchmark, single thread): 81381852 nanoseconds
(throughput = 1157.55 nanoseconds per iteration)
sub-sequence query (random-access latency benchmark, single thread): 110422177 nanoseconds
(throughput = 1147.72 nanoseconds per iteration)
sub-sequence query (random-access latency benchmark, single thread): 10972431 nanoseconds
(throughput = 1141.53 nanoseconds per iteration)
bandwidth benchmark, sequential access, multi-threaded: 16096582 nanoseconds
(bandwidth = 289.11 MB/s) (throughput = 3459.40 nanoseconds per iteration)
bandwidth benchmark, sequential access, multi-threaded: 166062 nanoseconds
(bandwidth = 423.37 MB/s) (throughput = 2372.31 nanoseconds per iteration)
bandwidth benchmark, sequential access, multi-threaded: 346675 nanoseconds
(bandwidth = 277.52 MB/s) (throughput = 3611.20 nanoseconds per iteration)
bandwidth benchmark, sequential access, multi-threaded: 62097 nanoseconds
(bandwidth = 154.79 MB/s) (throughput = 6899.67 nanoseconds per iteration)
bandwidth benchmark, random access, multi-threaded: 15461473 nanoseconds
(bandwidth = 300.99 MB/s) (throughput = 3322.90 nanoseconds per iteration)
bandwidth benchmark, random access, multi-threaded: 157730 nanoseconds
(bandwidth = 445.73 MB/s) (throughput = 2253.29 nanoseconds per iteration)
bandwidth benchmark, random access, multi-threaded: 209482 nanoseconds
(bandwidth = 459.28 MB/s) (throughput = 2182.10 nanoseconds per iteration)
bandwidth benchmark, random access, multi-threaded: 47639 nanoseconds
(bandwidth = 201.77 MB/s) (throughput = 5293.22 nanoseconds per iteration)
Source:
#include "FastaGeneIndexer.h"
#include "CpuBenchmarker.h"
#include <random>
int main(int argC, char** argV)
{
try
{
bool debug = true;
FastaGeneIndexer cache("./data/yersinia_pestis_genomic.fna", debug);
size_t n = cache.n();
size_t byteCount = 0;
{
CpuBenchmarker bench(0,"init");
for(int i=0;i<n;i++)
{
std::string str = cache.getSequence(i);
byteCount += str.length();
}
}
{
CpuBenchmarker bench(0,"sub-sequence query (sequential-access latency benchmark, single thread)",byteCount);
for(int i=0;i<n;i++)
{
std::string str = cache.getSequence(i);
const size_t sn = str.length();
for(int j=0;j<sn;j++)
{
auto str1 = cache.getSequence(i,j,1);
}
}
}
{
for(int i=0;i<n;i++)
{
std::string str = cache.getSequence(i);
const size_t sn = str.length();
std::random_device rd;
std::mt19937 rng(rd());
std::uniform_real_distribution<float> rnd(0,sn-1);
CpuBenchmarker bench(0,"sub-sequence query (random-access latency benchmark, single thread)",sn);
for(int j=0;j<sn;j++)
{
int jj = rnd(rng)+0.1f;
std::string str1 = cache.getSequence(i,jj,1);
}
}
}
{
for(int i=0;i<n;i++)
{
std::string str = cache.getSequence(i);
size_t len = str.length();
{
CpuBenchmarker bench(len,"bandwidth benchmark, sequential access, multi-threaded",(len/1000));
#pragma omp parallel for
for(int j=0;j<len;j+=1000)
{
int ln = ((j+1000 < len)?1000:(len-j));
std::string str1 = cache.getSequence(i,j,ln);
}
}
}
}
{
auto f = [](const int & min, const int & max) {
static thread_local std::mt19937 generator;
std::uniform_int_distribution<int> distribution(min,max);
return distribution(generator);
};
for(int i=0;i<n;i++)
{
std::string str = cache.getSequence(i);
size_t len = str.length();
{
CpuBenchmarker bench(len,"bandwidth benchmark, random access, multi-threaded",(len/1000));
#pragma omp parallel for
for(int j=0;j<len;j+=1000)
{
int jj = f(0,len-1000);
int ln = ((jj+1000 < len)?1000:(len-jj));
std::string str1 = cache.getSequence(i,jj,ln);
}
}
}
}
}
catch(std::exception & e)
{
std::cout<< e.what() <<std::endl;
}
return 0;
}