-
Notifications
You must be signed in to change notification settings - Fork 181
Pg/hpu bench fix #2355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pg/hpu bench fix #2355
Conversation
tfhe/examples/hpu/bench.rs
Outdated
let res_hpu = (0..args.iter) | ||
.map(|_i| { | ||
let res = HpuRadixCiphertext::exec(&proto, iop.opcode(), &srcs_enc, &imms); | ||
std::hint::black_box(&res); | ||
res | ||
}) | ||
.next_back() | ||
.last() | ||
.expect("Iteration must be greater than 0"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to loop or collect first, iterators being greedy, without comment here it's easy to re introduce the bug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in the last commit.
We now explicitly collect intermediate results in a vector
hpu_bench example was wrong for iter > 1 following clippy modifications. NB: Vector is collect but intermediate value are explicitly drop to enable long-time stressed tests.
03eac43
to
41bf88a
Compare
Commits were squashed and rebased on main. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks !
if i == (args.iter - 1) { | ||
Some(res) | ||
} else { | ||
None | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for the record, there could be a world where a very smart optimizer might trivialize this and not run some perf stuff, but thankfully today it works so all good :)
Issue is appearing when running bench with iter > 1, for example:
cargo run --profile devo --features=integer,internal-keycache,hpu-v80 --example hpu_bench -- --iter 10 --integer-w 64
because of usage of .next_back() measured latency is only the latency of the 1st operation.
No emergency to merge.