-
-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Tracking #1
Comments
How do you plan to support NEON? Would you be willing to help out stabilize |
@dragostis I'm planning to use the arm and aarch64 NEON intrinsics behind a feature-flag until they are stabilized. As for Thermite, on the other hand, uses its |
Thanks for the detailed reply. I see what you mean about I was actually more curious about how they're using LLVM intrinsics for the arm part. Since this is not what you want to do, do you have plans to move forward the stabilization of the arm part of |
I am entirely unaffiliated with Rust core or the stabilization efforts. I'm not familiar with what it would take to advance stabilization, either. Regarding the LLVM intrinsics (platform intrinsics), they are both great and annoying at the same time. LLVM has implemented some great codegen algorithms to do a variety of tasks, but it's missing some operations that do exist as dedicated instructions, and the code it generates can be slightly rigid and overly safe at times. (shuffles and selects/blends come to mind). After having used However, at the same time, Rust's usage of platform-intrinsics internally using arbitrary types leads to a lot of extra LLVM bytecode being generated where I expect just a simple intrinsic call, which has led to small deoptimizations in isolated cases, mostly centered around const-folding (not Rust const, but LLVM const) and algebraic simplification. I've tried to minimize that as much as possible in Thermite, but it probably doesn't matter much on a larger scale anyway. Just a nitpick. Also, while I'm here, I'm going to find some time soon to continue on the other backends. Scalar is mostly complete, but I need to be careful with select/blend ops to ensure good codegen with those abstractions. SSE4.2 will be next. |
Backends
Extra data types
These can use 128-bit registers even on AVX/AVX2, and 256-bit registers on AVX512
Polyfills
Iterator library
Vectorized math library
Currently fully implemented for single and double-precision:
sin
,cos
,tan
,asin
,acos
,atan
,atan2
,sinh
,cosh
,tanh
,asinh
,acosh
,atanh
,exp
,exp2
,exph
(0.5 * exp
),exp10
,exp_m1
,cbrt
,powf
,ln
,ln_1p
,ln2
,ln10
,erf
,erfinv
,tgamma
,lgamma
,next_float
,prev_float
Precision-agnostic implementations:
lerp
,scale
,fmod
,powi
(single and vector exponents),poly
,poly_f
,poly_rational
,summation_f
,product_f
,smoothstep
,smootherstep
,smootheststep
,hermite
(single and vector degrees),jacobi
,legendre
,bessel_y
TODO:
Bessel functions:
Complex and Dual number libraries
Precision Improvements
lgamma
where possible.ln(tgamma(x))
when we know it won't overflow?sin(x*π)
, etc.)Performance improvements:
1 - (1 - x)
is the trick.Policy improvements:
Size
policy, especially when WASM support is added (both scalar and SIMD)Testing
The text was updated successfully, but these errors were encountered: