Skip to content

Commit 9485897

Browse files
alambetseidlfindepi
authored
Minor: Document SIMD rationale and tips (#6554)
* Minor: Document SIMD rationale and tips * Apply suggestions from code review Co-authored-by: Ed Seidl <[email protected]> Co-authored-by: Piotr Findeisen <[email protected]> * More review feedback * tweak * Update arrow/CONTRIBUTING.md * Update arrow/CONTRIBUTING.md * clarify inlining more * formating --------- Co-authored-by: Ed Seidl <[email protected]> Co-authored-by: Piotr Findeisen <[email protected]>
1 parent 9d06019 commit 9485897

File tree

1 file changed

+36
-0
lines changed

1 file changed

+36
-0
lines changed

arrow/CONTRIBUTING.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,42 @@ specific JIRA issues and reference them in these code comments. For example:
109109
// This is not sound because .... see https://issues.apache.org/jira/browse/ARROW-nnnnn
110110
```
111111

112+
### Usage of SIMD / auto vectorization
113+
114+
This crate does not use SIMD intrinsics (e.g. [`std::simd`]) directly, but
115+
instead relies on the Rust compiler's auto-vectorization capabilities, which are
116+
built on LLVM.
117+
118+
SIMD intrinsics are difficult to maintain and can be difficult to reason about.
119+
The auto-vectorizer in LLVM is quite good and often produces kernels that are
120+
faster than using hand-written SIMD intrinsics. This crate used to contain
121+
several kernels with hand-written SIMD instructions, which were removed after
122+
discovering the auto-vectorized code was faster.
123+
124+
[`std::simd`]: https://doc.rust-lang.org/std/simd/index.html
125+
126+
#### Tips for auto vectorization
127+
128+
LLVM is relatively good at vectorizing vertical operations provided:
129+
130+
1. No conditionals within the loop body (e.g no checking for nulls on each row)
131+
2. Not too much inlining (judicious use of `#[inline]` and `#[inline(never)]`) as the vectorizer gives up if the code is too complex
132+
3. No [horizontal reductions] or data dependencies
133+
4. Suitable SIMD instructions available in the target ISA (e.g. `target-cpu` `RUSTFLAGS` flag)
134+
135+
[horizontal reductions]: https://rust-lang.github.io/packed_simd/perf-guide/vert-hor-ops.html
136+
137+
The last point is especially important as the default `target-cpu` doesn't
138+
support many SIMD instructions. See the Performance Tips section at the
139+
end of <https://crates.io/crates/arrow>
140+
141+
To ensure your code is fully vectorized, we recommend using tools like
142+
<https://rust.godbolt.org/> (again being sure `RUSTFLAGS` is set appropriately)
143+
to analyze the resulting code, and only once you've exhausted auto vectorization
144+
think of reaching for manual SIMD. Generally the hard part of vectorizing code
145+
is structuring the algorithm in such a way that it can be vectorized, regardless
146+
of what generates those instructions.
147+
112148
# Releases and publishing to crates.io
113149

114150
Please see the [release](../dev/release/README.md) for details on how to create arrow releases

0 commit comments

Comments
 (0)