11# x86-simd-sort  
22
33C++ header file library for SIMD based 16-bit, 32-bit and 64-bit data type
4- sorting on x86 processors. Source header files are available in src directory.
5- We currently only have AVX-512 based implementation of quicksort. This
6- repository also includes a test suite which can be built and run to test the
7- sorting algorithms for correctness. It also has benchmarking code to compare
8- its performance relative to std::sort.
4+ sorting algorithms on x86 processors. Source header files are available in src
5+ directory.  We currently only have AVX-512 based implementation of quicksort,
6+ argsort, quickselect, paritalsort and key-value sort. This repository also
7+ includes a test suite which can be built and run to test the sorting algorithms
8+ for correctness. It also has benchmarking code to compare its performance
9+ relative to std::sort. The following API's are currently supported:
10+ 
11+ #### Quicksort  
12+ 
13+ ``` 
14+ void avx512_qsort<T>(T* arr, int64_t arrsize) 
15+ ``` 
16+ Supported datatypes: `uint16_t, int16_t, _ Float16, uint32_t, int32_t, float,
17+ uint64_t, int64_t and double`
18+ 
19+ #### Argsort  
20+ 
21+ ``` 
22+ std::vector<int64_t> arg = avx512_argsort<T>(T* arr, int64_t arrsize) 
23+ void avx512_argsort<T>(T* arr, int64_t *arg, int64_t arrsize) 
24+ ``` 
25+ Supported datatypes: ` uint32_t, int32_t, float, uint64_t, int64_t and double ` .
26+ The algorithm resorts to scalar ` std::sort `  if the array contains NAN.
27+ 
28+ #### Quickselect  
29+ 
30+ ``` 
31+ void avx512_qselect<T>(T* arr, int64_t arrsize) 
32+ void avx512_qselect<T>(T* arr, int64_t arrsize, bool hasnan) 
33+ ``` 
34+ Supported datatypes: `uint16_t, int16_t, _ Float16 ,uint32_t, int32_t, float,
35+ uint64_t, int64_t and double` . Use an additional optional argument  ` bool
36+ hasnan` if you expect your arrays to contain nan.
37+ 
38+ #### Partialsort  
39+ 
40+ ``` 
41+ void avx512_partialsort<T>(T* arr, int64_t arrsize) 
42+ void avx512_partialsort<T>(T* arr, int64_t arrsize, bool hasnan) 
43+ ``` 
44+ Supported datatypes: `uint16_t, int16_t, _ Float16 ,uint32_t, int32_t, float,
45+ uint64_t, int64_t and double` . Use an additional optional argument  ` bool
46+ hasnan` if you expect your arrays to contain nan.
47+ 
48+ #### Key-value sort  
49+ ``` 
50+ void avx512_qsort_kv<T>(T* key, uint64_t* value , int64_t arrsize) 
51+ ``` 
52+ Supported datatypes: ` uint64_t, int64_t and double ` 
953
1054## Algorithm details  
1155
@@ -20,13 +64,14 @@ network. The core implementations of the vectorized qsort functions
2064` avx512_qsort<T>(T*, int64_t) `  are modified versions of avx2 quicksort
2165presented in the paper [ 2]  and source code associated with that paper [ 3] .
2266
23- ## Handling  NAN in float and double arrays 
67+ ## A note on  NAN in float and double arrays 
2468
2569If you expect your array to contain NANs, please be aware that the these
26- routines ** do not preserve your NANs as you pass them** . The
27- ` avx512_qsort<T>() `  routine will put all your NAN's at the end of the sorted
28- array and replace them with ` std::nan("1") ` . Please take a look at
29- ` avx512_qsort<float>() `  and ` avx512_qsort<double>() `  functions for details.
70+ routines ** do not preserve your NANs as you pass them** . The quicksort,
71+ quickselect, partialsort and key-value sorting routines will sort NAN's to the
72+ end of the array and replace them with ` std::nan("1") ` . ` avx512_argsort ` 
73+ routines will also resort to a scalar argsort that uses ` std::sort `  to sort array
74+ that contains NAN.
3075
3176## Example to include and build this in a C++ code  
3277
@@ -36,7 +81,7 @@ array and replace them with `std::nan("1")`. Please take a look at
3681#include  " src/avx512-32bit-qsort.hpp" 
3782
3883int  main () {
39-     const int ARRSIZE = 10 ; 
84+     const int ARRSIZE = 1000 ; 
4085    std::vector<float> arr; 
4186
4287    /* Initialize elements is reverse order */ 
@@ -45,7 +90,7 @@ int main() {
4590    } 
4691
4792    /* call avx512 quicksort */ 
48-     avx512_qsort<float> (arr.data(), ARRSIZE); 
93+     avx512_qsort (arr.data(), ARRSIZE);
4994    return 0;
5095}
5196
@@ -54,7 +99,7 @@ int main() {
5499### Build using gcc  
55100
56101``` 
57- gcc  main.cpp -mavx512f -mavx512dq -O3
102+ g++  main.cpp -mavx512f -mavx512dq -O3
58103``` 
59104
60105This is a header file only library and we do not provide any compile time and
@@ -75,33 +120,40 @@ compiler to build.
75120gcc >= 8.x 
76121``` 
77122
123+ ### Build using Meson  
124+ 
125+ meson is the recommended build system to build the test and benchmark suite.
126+ 
127+ ``` 
128+ meson setup builddir && cd builddir && ninja 
129+ ``` 
130+ 
131+ It build two executables:
132+ 
133+ -  ` testexe ` : runs a bunch of tests written in ./tests directory.
134+ -  ` benchexe ` : measures performance of these algorithms for various data types.
135+ 
136+ 
78137### Build using Make  
79138
80- ` make `  command builds two executables:
139+ Makefile uses ` -march=sapphirerapids `  as a global compile flag and hence it
140+ will require g++-12. ` make `  command builds two executables:
81141-  ` testexe ` : runs a bunch of tests written in ./tests directory.
82142-  ` benchexe ` : measures performance of these algorithms for various data types
83143  and compares them to std::sort.
84144
85145You can use ` make test `  and ` make bench `  to build just the ` testexe `  and
86146` benchexe `  respectively.
87147
88- ### Build using Meson  
89- 
90- You can also build ` testexe `  and ` benchexe `  using Meson/Ninja with the following
91- command:
92- 
93- ``` 
94- meson setup builddir && cd builddir && ninja 
95- ``` 
96- 
97148## Requirements and dependencies  
98149
99150The sorting routines relies only on the C++ Standard Library and requires a
100151relatively modern compiler to build (gcc 8.x and above). Since they use the
101152AVX-512 instruction set, they can only run on processors that have AVX-512.
102153Specifically, the 32-bit and 64-bit require AVX-512F and AVX-512DQ instruction
103154set. The 16-bit sorting requires the AVX-512F, AVX-512BW and AVX-512 VMBI2
104- instruction set. The test suite is written using the Google test framework.
155+ instruction set. The test suite is written using the Google test framework. The
156+ benchmark is written using the google benchmark framework.
105157
106158## References  
107159
0 commit comments