-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Make char::is_ascii_whitespace branchless on 32 and 64-bit targets #77021
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
r? @cramertj (rust_highfive has picked a reviewer for you, use r? to override) |
Benchmark code: use criterion::{black_box, criterion_group, criterion_main, Criterion};
fn bench_is_ascii_whitespace(c: &mut Criterion) {
let mut group = c.benchmark_group("is_ascii_whitespace");
group.bench_function("std", |b| {
b.iter(|| {
let mut n = 0;
for i in 0..128u8 {
if is_whitespace::std_is_ascii_whitespace(black_box(&(i as char))) {
n += 1;
}
}
black_box(n);
})
});
group.bench_function("pr", |b| {
b.iter(|| {
let mut n = 0;
for i in 0..128u8 {
if is_whitespace::pr_is_ascii_whitespace(black_box(&(i as char))) {
n += 1;
}
}
black_box(n);
})
});
}
criterion_group!(benches, bench_is_ascii_whitespace);
criterion_main!(benches); Generated code: https://rust.godbolt.org/z/Ws69P3 |
6fa9a37
to
82ff02b
Compare
65ad849
to
960c039
Compare
6bce617
to
10c01ee
Compare
10c01ee
to
f1e7495
Compare
If you look at the generated assembly of the match-based version, you'll find that LLVM already does the u32 as bit set trick, and generates leaner code. |
I haven't looked at the machine code, but the new version is one instruction shorter and (maybe more importantly) branchless. |
Being branchless will not be very beneficial here, because a) In normal text, there isn't that much whitespace (or even other chars < 33), and so the branch will often be taken resulting in only a cmp, jg and xor being run and b) branch prediction has become very good on most architectures (although it'll be interesting to see the result on ARM, which was historically worse than Intel or AMD; unfortunately cargo asm doesn't work on my phone; will look into --emit asm and report back later). Also even if there's 1 more instruction, with the branchy version not all instructions are actually used, so that shouldn't make a difference. |
☔ The latest upstream changes (presumably #77630) made this pull request unmergeable. Please resolve the merge conflicts. Note that reviewers usually do not review pull requests until merge conflicts are resolved! Once you resolve the conflicts, you should change the labels applied by bors to indicate that your PR is ready for review. Post this as a comment to change the labels:
|
I was cleaning up some old git branches and I noticed that this PR was partially vindicated by the LLVM 13, 15 and 19 upgrades, which output branchless code: is_ascii_whitespace_1_55:
mov ecx, dword ptr [rdi]
add ecx, -9
cmp ecx, 23
ja .LBB0_2
mov eax, 8388635
shr eax, cl
and al, 1
ret
.LBB0_2:
xor eax, eax
ret
is_ascii_whitespace_1_56:
mov eax, dword ptr [rdi]
add eax, -9
cmp eax, 24
setb cl
mov edx, 8388635
bt edx, eax
setb al
and al, cl
ret
is_ascii_whitespace_1_65:
mov eax, dword ptr [rdi]
cmp eax, 33
setb cl
movabs rdx, 4294981120
bt rdx, rax
setb al
and al, cl
ret
is_ascii_whitespace_1_82:
mov ecx, dword ptr [rdi]
cmp ecx, 33
setb dl
movabs rax, 4294981120
shr rax, cl
and al, dl
ret
is_ascii_whitespace_pr_1_55:
mov eax, dword ptr [rdi]
add eax, -1
cmp eax, 32
setb cl
mov edx, -2147476736
bt edx, eax
setb al
and al, cl
ret
is_ascii_whitespace_pr_1_82:
mov ecx, dword ptr [rdi]
dec ecx
cmp ecx, 32
setb dl
mov eax, 1
shl eax, cl
test eax, -2147476736
setne al
and al, dl
ret |
No description provided.