-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Add unstable <[T]>::{position, rposition} functions #84058
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
(rust-highfive has picked a reviewer for you, use r? to override) |
We don't need separate methods for that though specialization on slice iterators can achieve the same. |
The slice iterator functions take closures. There's no way for the specialization to know the closure is just performing an equality check, so no, sadly that's not possible. |
Hrrm, granted, but do the loops not end up vectorizing to the same result? |
Nope, IME not even close! Sometimes it tries and does a... middling job at best. It's very hard for autovectorization to beat a hand-tuned method like this, and will also benefit from any optimizations we do to the libcore memchr implementation in the future. |
Could you show one difference (with iterator vs this function) for the generated Asm? |
Our memchr function isn't inlined, so you can't really see it. Also, I can't call it on godbolt. That said, you can look at it's source https://github.com/rust-lang/rust/blob/master/library/core/src/slice/memchr.rs and know that it processes things in 2x-unrolled The position loop turns into something like this https://rust.godbolt.org/z/xsPYPWc5E example::foo:
test rsi, rsi
je .LBB0_1
xor eax, eax
xor edx, edx
.LBB0_3:
cmp byte ptr [rdi + rdx], 0
je .LBB0_4
add rdx, 1
cmp rsi, rdx
jne .LBB0_3
mov rdx, rsi
ret
.LBB0_1:
xor edx, edx
xor eax, eax
ret
.LBB0_4:
mov eax, 1
ret E.g. byte at a time processing. It's hard for it to do much better with this style of loop (which requires knowing the position of the match), and in practice, even if LLVM were to do a good job on it, it's unclear that it would be a good thing to inline, since it would lead to code bloat (this is often a problem with LLVM's autovectorization IME however). In practice you can get at least a We should be able to improve that further with SIMD, and improve the current code on smaller and less aligned slices too (which is something this memchr does okay for, but the memchr crate does worse on the larger the vector width you use gets). For example we can do similar things to what I did in That said, that's unrelated, and while I'd like to do it (it would benefit all uses of memchr inside libcore/libstd — personally notably is the \0 search done in the slices for Anyway, I get that maybe we wish that That is, I don't have to remember to do |
The reason we can't specialize |
Sure. I'm not picky about names at all, and always open to suggestions. Perhaps
This is viable, but I'm not a fan for several reasons: I think it's way more out of place for this to be on On Iterator, it's more obviously a hack for optimization, whereas IMO on slice there's a slight ergonomics benefit as well (it's a direct answer to a new user's question of "where is the equivalent of Despite that it doesn't apply to most iterators, "iterators that boil down to operating on slices" is a large number of iterators in the stdlib alone. For example, if a function for this is on Iterator, it plausibly should be specialized for:
And several more. This is... tedious. Even if these are not done initially, over time we should expect to gain them as people file optimization PRs for their own specific cases. Putting it on Note that this is especially true when it needs to be specialized for the iterated type as well, e.g. Finally, putting it on slices already has the precedent of being where |
Since this is mostly about using |
As I mentioned a few times, it's not solely about i8/u8:
|
pub fn find<'a, P>(&'a self, pat: P) -> Option<usize> where
P: Pattern<'a>; Maybe we could name this 'find'? |
Hm, I guess this is redundant with #56345. I guess this can be closed since presumably someday someone will implement that mammoth of an RFC. |
These are essentially analogous to what
<[T]>::contains
is toslice::Iter::any
.The APIs in question are (see the commit diff for the doc comments and such, ofc)
The primary reason for adding these when the iterator methods already exist, is that for primitives they can be specialized to use significantly faster search operations. Right now, this is done for
u8
andi8
, since we already have a serviceable implementation ofmemchr
right there (even if it's not the fastest, it will likely eventually be improved, and it's easily faster thaniter().position()
).Note however, that i8/u8 are not the only reasons to add these. Eventually (such as when portable simd is more usable) a specialization for
u16
,i16
,u32
,i32
f32
,char
... (and really, all the primitives) could be provided with simd and give slices of those types a substantial performance boost.Regarding the name, I'm not really that tied to this one, but it seems unambiguous given that it mirrors the name on iterators.
I'll file a tracking issue once I have a better idea if these will land or not.