-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve ODE performance #128
Conversation
What is |
A side note for backlog: OK since now we split constant terms and use |
This PR requires #136 don't merge before that one is merged |
I have merged PR #136 , this PR does not seem to be compatible with the block system. Do you want keep working in this PR or open a new PR for that? |
There's a separate PR for that #137 @GiggleLiu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is an OK pr, well tested and should be good to merge. But I think you need to refactor the design a bit later.
after some attempts, I decide to have a quick patch first instead of a complete rewrite of the hamiltonian expr, this basically doesn’t change any APIs for the performance issue with some ugly workaround, there are still a few things left to do:
fma
intrinsic for ODE solvers in those intrinsic because those intrinsic were designed for gates previously (so, I’d expect ~20% speedup in total with this), see obeymul!
convention QuantumBFS/BQCESubroutine.jl#37Sum(i->Omega * i, X, 1:N) + Sum(i->2i, N, 1:N)
we can just implement this transform as merging similar terms. But this would require more work and doesn't fit well with YaoBlocks at the moment (YaoBlocks can't do general pattern match & rewrite)@jon-wurtz 's QuSpin benchmark for this PR as a reference, tested on AWS EC2 c5a.xlarge (AMD CPU) QuSpin:
EaRyd (include compilation, first time execution)
exclude compilation time
we can include some precompile statements to get rid of that compile time for the default solver but I think that's gonna be in another PR.
Why QuSpin is slower?
It's actually not clear to me why QuSpin is slower, I think it's probably due to different memory layout, in QuSpin the memory layout is using array of struct layout which I'd expect to be faster.
After some comparison, our equation evaluation is actually slightly slower than QuSpin since the sparse multiplication is slower (by ~
5ms
). so the only reason then is the ODE solver is faster and use much less number of steps to achieve similar precision.some notes
why
Yao.cache
doesn't work here:mat
for subspace cache is problematic, since the cache sever does notknow about the space
for individual pulse
XTerm
needs to be either split into sum ofput(i=>X)
or individual matrices, usingapply!
directly on eachput(i=>X)
is faster than sum the expression since we can manually only allocate 3 arrays to do the reduction, instead of https://github.com/QuantumBFS/Yao.jl/blob/master/lib/YaoBlocks/src/composite/reduce.jl#L32and it will be faster without
cache