You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Besides the several formats that have been regressed since 1.25 because of #53340 there was a new regression of ~17% in a single file. That corresponded to a specific section of code that decodes YUV format to RGB. Here's a minimized version
fnclampbits(val:i32,bits:u32) -> i32{let temp = val >> bits;if temp != 0{
!temp >> (32-bits)}else{
val
}}fndecode_snef_compressed(buf:&[u8],width:usize,height:usize,coeffs:[f32;3]) -> Vec<u16>{letmut out:Vec<u16> = vec![0; width*height*3];let inv_wb_r = (1024.0 / coeffs[0])asi32;let inv_wb_b = (1024.0 / coeffs[2])asi32;for(o, i)in out.chunks_mut(6).zip(buf.chunks(6)){let g1:u16 = i[0]asu16;let g2:u16 = i[1]asu16;let g3:u16 = i[2]asu16;let g4:u16 = i[3]asu16;let g5:u16 = i[4]asu16;let g6:u16 = i[5]asu16;let y1 = (g1 | ((g2 &0x0f) << 8))asf32;let y2 = ((g2 >> 4) | (g3 << 4))asf32;let cb = (g4 | ((g5 &0x0f) << 8))asf32 - 2048.0;let cr = ((g5 >> 4) | (g6 << 4))asf32 - 2048.0;let r = clampbits((y1 + 1.370705* cr)asi32,12)asu16;let g = clampbits((y1 - 0.337633* cb - 0.698001* cr)asi32,12)asu16;let b = clampbits((y1 + 1.732446* cb)asi32,12)asu16;// invert the white balance
o[0] = clampbits((inv_wb_r * r asi32 + (1<<9)) >> 10,15)asu16;
o[1] = g;
o[2] = clampbits((inv_wb_b * b asi32 + (1<<9)) >> 10,15)asu16;let r = clampbits((y2 + 1.370705* cr)asi32,12)asu16;let g = clampbits((y2 - 0.337633* cb - 0.698001* cr)asi32,12)asu16;let b = clampbits((y2 + 1.732446* cb)asi32,12)asu16;// invert the white balance
o[3] = clampbits((inv_wb_r * r asi32 + (1<<9)) >> 10,15)asu16;
o[4] = g;
o[5] = clampbits((inv_wb_b * b asi32 + (1<<9)) >> 10,15)asu16;}
out
}fnmain(){let width = 5000;let height = 4000;letmut buffer:Vec<u8> = vec![0; width*height*3];// Make sure we don't get optimized out by writing some data into the bufferfor(i, val)in buffer.chunks_mut(1).enumerate(){
val[0] = i asu8;}for _ in0..30{decode_snef_compressed(&buffer, width, height,[2.0,1.0,1.5]);}}
I'm running this on my machine across versions with opt-level=3 and taking the median of 5 runs:
snef_compressed : 1.20.0 BASE 6.96
snef_compressed : 1.21.0 OK 6.95 (-0%)
snef_compressed : 1.22.1 OK 6.92 (-0%)
snef_compressed : 1.23.0 OK 6.93 (-0%)
snef_compressed : 1.24.1 OK 6.94 (-0%)
snef_compressed : 1.25.0 OK 7.14 (+2%)
snef_compressed : 1.26.2 OK 7.09 (+1%)
snef_compressed : 1.27.2 OK 7.07 (+1%)
snef_compressed : 1.28.0 OK 7.14 (+2%)
snef_compressed : 1.29.2 OK 7.01 (+0%)
snef_compressed : 1.30.0 FAIL 7.37 (+5%)
snef_compressed : beta FAIL 7.32 (+5%)
snef_compressed : nightly FAIL 7.27 (+4%)
I'm not replicating the full 17% and there seems to have already been a smaller regression since 1.25. The difference in replication may be because the original code runs threaded with rayon across 4 threads but only 2 cores (hyperthreading) and that causes a higher penalty.
The text was updated successfully, but these errors were encountered:
memoryruins
added
the
I-slow
Issue: Problems and improvements with respect to performance of generated code.
label
Oct 29, 2018
Strange, the slowdown seems easy to reproduce and consistent. This one has clear godbolt differences and is a much bigger regression if you want to look at that instead:
The good news is that I apparently screwed up minimizing the test case. I had wondered why I couldn't reproduce the 17% regression but assumed it was still exercising the problem. As it turns out 1.31 fixed the regression and yet this test case is still showing the same 3-4%. Here's the run for 1.31:
As you can see in the "D810_Small.NEF" line, it got 14% faster than 1.30 and is now even slightly faster than the original 1.20. The large regression in #53340 is still there though.
I tried my rawloader performance regression tests with 1.30:
http://chimper.org/rawloader-rustc-benchmarks/
http://chimper.org/rawloader-rustc-benchmarks/version-1.30.0.html
Besides the several formats that have been regressed since 1.25 because of #53340 there was a new regression of ~17% in a single file. That corresponded to a specific section of code that decodes YUV format to RGB. Here's a minimized version
I'm running this on my machine across versions with
opt-level=3
and taking the median of 5 runs:I'm not replicating the full 17% and there seems to have already been a smaller regression since 1.25. The difference in replication may be because the original code runs threaded with rayon across 4 threads but only 2 cores (hyperthreading) and that causes a higher penalty.
The text was updated successfully, but these errors were encountered: