@@ -109,6 +109,42 @@ specific JIRA issues and reference them in these code comments. For example:
109
109
// This is not sound because .... see https://issues.apache.org/jira/browse/ARROW-nnnnn
110
110
```
111
111
112
+ ### Usage of SIMD / auto vectorization
113
+
114
+ This crate does not use SIMD intrinsics (e.g. [ ` std::simd ` ] ) directly, but
115
+ instead relies on the Rust compiler's auto-vectorization capabilities, which are
116
+ built on LLVM.
117
+
118
+ SIMD intrinsics are difficult to maintain and can be difficult to reason about.
119
+ The auto-vectorizer in LLVM is quite good and often produces kernels that are
120
+ faster than using hand-written SIMD intrinsics. This crate used to contain
121
+ several kernels with hand-written SIMD instructions, which were removed after
122
+ discovering the auto-vectorized code was faster.
123
+
124
+ [ `std::simd` ] : https://doc.rust-lang.org/std/simd/index.html
125
+
126
+ #### Tips for auto vectorization
127
+
128
+ LLVM is relatively good at vectorizing vertical operations provided:
129
+
130
+ 1 . No conditionals within the loop body (e.g no checking for nulls on each row)
131
+ 2 . Not too much inlining (judicious use of ` #[inline] ` and ` #[inline(never)] ` ) as the vectorizer gives up if the code is too complex
132
+ 3 . No [ horizontal reductions] or data dependencies
133
+ 4 . Suitable SIMD instructions available in the target ISA (e.g. ` target-cpu ` ` RUSTFLAGS ` flag)
134
+
135
+ [ horizontal reductions ] : https://rust-lang.github.io/packed_simd/perf-guide/vert-hor-ops.html
136
+
137
+ The last point is especially important as the default ` target-cpu ` doesn't
138
+ support many SIMD instructions. See the Performance Tips section at the
139
+ end of < https://crates.io/crates/arrow >
140
+
141
+ To ensure your code is fully vectorized, we recommend using tools like
142
+ < https://rust.godbolt.org/ > (again being sure ` RUSTFLAGS ` is set appropriately)
143
+ to analyze the resulting code, and only once you've exhausted auto vectorization
144
+ think of reaching for manual SIMD. Generally the hard part of vectorizing code
145
+ is structuring the algorithm in such a way that it can be vectorized, regardless
146
+ of what generates those instructions.
147
+
112
148
# Releases and publishing to crates.io
113
149
114
150
Please see the [ release] ( ../dev/release/README.md ) for details on how to create arrow releases
0 commit comments