|
| 1 | +--- |
| 2 | +layout: documentation-page |
| 3 | +collectionName: Miscellaneous extensions |
| 4 | +title: SIMD |
| 5 | +--- |
| 6 | + |
| 7 | +# SIMD |
| 8 | + |
| 9 | +The OxCaml compiler provides built-in 128-bit SIMD vector types, as well as |
| 10 | +intrinsics for amd64 SIMD instructions up to and including SSE4.2. |
| 11 | + |
| 12 | +<!-- CR mslater: link to simd libraries --> |
| 13 | +To get started with SIMD, add the `ocaml_simd_sse` library to your dependencies. |
| 14 | +You may also want to use `ppx_simd`, which provides convenient syntax for |
| 15 | +defining constants like blend and shuffle masks. |
| 16 | + |
| 17 | +## Types |
| 18 | + |
| 19 | +When SIMD is enabled, the following 128-bit SIMD vector types are available: |
| 20 | + |
| 21 | +``` |
| 22 | +int8x16 |
| 23 | +int8x16# |
| 24 | +int16x8 |
| 25 | +int16x8# |
| 26 | +int32x4 |
| 27 | +int32x4# |
| 28 | +int64x2 |
| 29 | +int64x2# |
| 30 | +float32x4 |
| 31 | +float32x4# |
| 32 | +float64x2 |
| 33 | +float64x2# |
| 34 | +``` |
| 35 | + |
| 36 | +The types ending with `#` are unboxed: they are passed between functions in XMM |
| 37 | +registers, stored in structures as flat data, and may be stored in flat arrays. |
| 38 | +The operations provided by `Ocaml_simd_sse` operate on unboxed vectors. For |
| 39 | +more detail on unboxed types, see the [docs](../unboxed-types/intro). |
| 40 | + |
| 41 | +The types without `#` are boxed: when passed to a non-inlined function, they |
| 42 | +will be copied to a heap allocated (abstract) block. Boxed vectors are not |
| 43 | +necessarily 16-byte aligned, so will generate unaligned load/store instructions. |
| 44 | + |
| 45 | +Within a function, all SIMD vectors live in floating-point registers or 16-byte |
| 46 | +aligned stack slots. |
| 47 | + |
| 48 | +## Intrinsics |
| 49 | + |
| 50 | +SIMD vectors are opaque: no operations on them are built into the |
| 51 | +language. Instead, the compiler translates certain "builtin" externals directly |
| 52 | +to SIMD instructions. Your code should use the `ocaml_simd_sse` library, which |
| 53 | +exposes an OxCaml API for these intrinsics. |
| 54 | + |
| 55 | +```ocaml |
| 56 | +module Float32x4 = Ocaml_simd_sse.Float32x4 |
| 57 | +
|
| 58 | +let v = Float32x4.set 1.0 2.0 3.0 4.0 |
| 59 | +let v = Float32x4.sqrt v |
| 60 | +let x, y, z, w = Float32x4.splat v |
| 61 | +``` |
| 62 | + |
| 63 | +SIMD vectors may be loaded from / stored to strings, bytes, bigstrings, and |
| 64 | +arrays of the corresponding unboxed type. Load and store operations are also |
| 65 | +provided by `ocaml_simd_sse`, rather than Base or Core. |
| 66 | + |
| 67 | +```ocaml |
| 68 | +module Int8x16 = Ocaml_simd_sse.Int8x16 |
| 69 | +
|
| 70 | +let text = "abcdefghijklmnopqrstuvwxyz" |
| 71 | +let floats = [| 1.0; 2.0 |] |
| 72 | +let ints = [| 1; 2 |] |
| 73 | +
|
| 74 | +let _ = Int8x16.String.get text ~byte:0 |
| 75 | +let _ = Float64x2.Float_array.get floats ~idx:0 (* Float array optimization required *) |
| 76 | +let _ = Int64x2.Immediate_array.get_tagged ints ~idx:0 |
| 77 | +``` |
| 78 | + |
| 79 | +Some operations require the user to choose a specific behavior at compile |
| 80 | +time. To do so, you must provide a compile time constant generated by |
| 81 | +`ppx_simd`. Refer to `ppx_simd` for more details. |
| 82 | + |
| 83 | +```ocaml |
| 84 | +module Int32x4 = Ocaml_simd_sse.Int32x4 |
| 85 | +
|
| 86 | +let x = Int32x4.set 0 2 4 6 |
| 87 | +let y = Int32x4.set 1 3 5 7 |
| 88 | +let z = Int32x4.blend [%blend 0, 1, 0, 1] x y |
| 89 | +``` |
| 90 | + |
| 91 | +## C ABI |
| 92 | + |
| 93 | +Like floats, both boxed and unboxed SIMD vectors may be passed to C stubs. The |
| 94 | +OxCaml runtime provides several helper functions for working with SIMD vectors. |
| 95 | + |
| 96 | +```ocaml |
| 97 | +external simd_stub : (int8x16[@unboxed]) -> (int8x16[@unboxed]) = |
| 98 | + "unboxed_integer_simd_stub" "boxed_integer_simd_stub" |
| 99 | +
|
| 100 | +(* ... *) |
| 101 | +``` |
| 102 | +```c |
| 103 | +#include <caml/simd.h> |
| 104 | + |
| 105 | +__m128i unboxed_integer_simd_stub(__m128i v) { |
| 106 | + return v; |
| 107 | +} |
| 108 | + |
| 109 | +value boxed_integer_simd_stub(value v) { |
| 110 | + return caml_copy_vec128i(unboxed_integer_simd_stub(Vec128_vali(v))); |
| 111 | +} |
| 112 | +``` |
| 113 | +
|
| 114 | +## Future Work |
| 115 | +
|
| 116 | +Support for wider vectors and NEON/AVX2/AVX512 intrinsics is coming soon. |
0 commit comments