11# x86-simd-sort
22
33C++ header file library for SIMD based 16-bit, 32-bit and 64-bit data type
4- sorting on x86 processors. Source header files are available in src directory.
5- We currently only have AVX-512 based implementation of quicksort. This
6- repository also includes a test suite which can be built and run to test the
7- sorting algorithms for correctness. It also has benchmarking code to compare
8- its performance relative to std::sort.
4+ sorting algorithms on x86 processors. Source header files are available in src
5+ directory. We currently only have AVX-512 based implementation of quicksort,
6+ argsort, quickselect, paritalsort and key-value sort. This repository also
7+ includes a test suite which can be built and run to test the sorting algorithms
8+ for correctness. It also has benchmarking code to compare its performance
9+ relative to std::sort. The following API's are currently supported:
10+
11+ ### Quicksort
12+
13+ ```
14+ avx512_qsort<T>(T* arr, int64_t arrsize)
15+ ```
16+ Supported datatypes: `uint16_t, int16_t, _ Float16, uint32_t, int32_t, float,
17+ uint64_t, int64_t and double`
18+
19+ ### Argsort
20+
21+ ```
22+ std::vector<int64_t> arg = avx512_argsort(T* arr, int64_t arrsize)
23+ void avx512_argsort(T* arr, int64_t *arg, int64_t arrsize)
24+ ```
25+ Supported datatypes: ` uint32_t, int32_t, float, uint64_t, int64_t and double ` .
26+ The algorithm resorts to scalar std::sort if the array contains NAN.
27+
28+ ### Quickselect
29+
30+ ```
31+ avx512_qselect<T>(T* arr, int64_t arrsize)
32+ avx512_qselect<T>(T* arr, int64_t arrsize, bool hasnan)
33+ ```
34+ Supported datatypes: `uint16_t, int16_t, _ Float16 ,uint32_t, int32_t, float,
35+ uint64_t, int64_t and double` . Use an additional optional argument ` bool
36+ hasnan` if you expect your arrays to contain nan.
37+
38+ ### Partialsort
39+
40+ ```
41+ avx512_partialsort<T>(T* arr, int64_t arrsize)
42+ avx512_partialsort<T>(T* arr, int64_t arrsize, bool hasnan)
43+ ```
44+ Supported datatypes: `uint16_t, int16_t, _ Float16 ,uint32_t, int32_t, float,
45+ uint64_t, int64_t and double` . Use an additional optional argument ` bool
46+ hasnan` if you expect your arrays to contain nan.
47+
48+ ### Key-value sort
49+ ```
50+ avx512_qsort_kv<T>(T* key, uint64_t* value , int64_t arrsize)
51+ ```
52+ Supported datatypes: ` uint64_t, int64_t and double `
953
1054## Algorithm details
1155
@@ -20,13 +64,14 @@ network. The core implementations of the vectorized qsort functions
2064` avx512_qsort<T>(T*, int64_t) ` are modified versions of avx2 quicksort
2165presented in the paper [ 2] and source code associated with that paper [ 3] .
2266
23- ## Handling NAN in float and double arrays
67+ ## A note on NAN in float and double arrays
2468
2569If you expect your array to contain NANs, please be aware that the these
26- routines ** do not preserve your NANs as you pass them** . The
27- ` avx512_qsort<T>() ` routine will put all your NAN's at the end of the sorted
28- array and replace them with ` std::nan("1") ` . Please take a look at
29- ` avx512_qsort<float>() ` and ` avx512_qsort<double>() ` functions for details.
70+ routines ** do not preserve your NANs as you pass them** . The quicksort,
71+ quickselect, partialsort and key-value sorting routines will sort NAN's to the
72+ end of the array and replace them with ` std::nan("1") ` . ` avx512_argsort `
73+ routines will also resort to a scalar argsort that uses std::sort to sort array
74+ that contains NAN.
3075
3176## Example to include and build this in a C++ code
3277
@@ -45,7 +90,7 @@ int main() {
4590 }
4691
4792 /* call avx512 quicksort */
48- avx512_qsort<float> (arr.data(), ARRSIZE);
93+ avx512_qsort (arr.data(), ARRSIZE);
4994 return 0;
5095}
5196
@@ -54,7 +99,7 @@ int main() {
5499### Build using gcc
55100
56101```
57- gcc main.cpp -mavx512f -mavx512dq -O3
102+ g++ main.cpp -mavx512f -mavx512dq -O3
58103```
59104
60105This is a header file only library and we do not provide any compile time and
@@ -75,33 +120,40 @@ compiler to build.
75120gcc >= 8.x
76121```
77122
123+ ### Build using Meson
124+
125+ meson is the recommended build system to build the test and benchmark suite.
126+
127+ ```
128+ meson setup builddir && cd builddir && ninja
129+ ```
130+
131+ It build two executables:
132+
133+ - ` testexe ` : runs a bunch of tests written in ./tests directory.
134+ - ` benchexe ` : measures performance of these algorithms for various data types.
135+
136+
78137### Build using Make
79138
80- ` make ` command builds two executables:
139+ Makefile uses ` -march=sapphirerapids ` as a global compile flag and hence it
140+ will require g++-12. ` make ` command builds two executables:
81141- ` testexe ` : runs a bunch of tests written in ./tests directory.
82142- ` benchexe ` : measures performance of these algorithms for various data types
83143 and compares them to std::sort.
84144
85145You can use ` make test ` and ` make bench ` to build just the ` testexe ` and
86146` benchexe ` respectively.
87147
88- ### Build using Meson
89-
90- You can also build ` testexe ` and ` benchexe ` using Meson/Ninja with the following
91- command:
92-
93- ```
94- meson setup builddir && cd builddir && ninja
95- ```
96-
97148## Requirements and dependencies
98149
99150The sorting routines relies only on the C++ Standard Library and requires a
100151relatively modern compiler to build (gcc 8.x and above). Since they use the
101152AVX-512 instruction set, they can only run on processors that have AVX-512.
102153Specifically, the 32-bit and 64-bit require AVX-512F and AVX-512DQ instruction
103154set. The 16-bit sorting requires the AVX-512F, AVX-512BW and AVX-512 VMBI2
104- instruction set. The test suite is written using the Google test framework.
155+ instruction set. The test suite is written using the Google test framework. The
156+ benchmark is written using the google benchmark framework.
105157
106158## References
107159
0 commit comments