-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelization in [email protected], compared with [email protected] #236
Comments
Hi Yue, thanks for bringing this up, this is really helpful! The goal of the rewrite wasn't necessarily to remove the multithreading, it's more that I wanted to delegate some of the responsibility of who should implement that out of MPSKit, in the sense that this is just a block-sparse contraction, which should be implemented by BlockTensorKit.jl now. I'm definitely willing to spend the time to add it back in, and this should not be too much work, but before I do this, would you be willing to also try the single threaded MPSKit v0.12 but with number of BLAS threads equal to the available threads? I feel like that is a bit more fair of a comparison. |
Yes, when I tried the single thread, the number of BLAS threads is equal to the available threads. But I do agree with you that the efficiency problem is case dependent. Would you mind teach me how to add it back in the v0.12, I am willing to try more cases for comparison and I will share my data of different cases. |
There is a tiny bit of infrastructure missing to make this fully customizable right now, but the general way to add it back consists of two steps:
The first step is something we're actively trying to figure out as well for TensorKit.jl, see for example this draft PR which would already add multithreading on the symmetry-blocks level. The second is a matter of defining a custom Additionally, any kind of benchmarks and profiler setups are immensely helpful to actually gauge how well the implementations do compared to the base case which simply uses BLAS multithreading. In particular, I have no real idea if we should focus on having multithreading at the symmetry level, or at the BLAS level, or at the blocks in the hamiltonian level, or a combination of all of them. |
Since multithreading at the symmetry-blocks level has already been implemented in TensorKit in |
Yes, this is very much related. I wouldn't say that it has already fully been implemented, but that is definitely an initial push towards making that work. I don't want to yet start recommending to use these things, because the actual interface is still subject to change, but it does outline some of the ideas we are working with. |
I see... so for now, if I want to use multithreading to deal with some work, is it best to go back to |
I would advise against I'll try and spend some time this week to make the multithreading branch at least usable, if you are willing to accept that there might be some bugs that we'll have to fix as we go along? |
I think that branch should now have rudimentary support for selecting some multithreading over the different symmetry blocks. In particular, see the file Let me know if anything is not clear, or not behaving as expected? As a small side note, make sure you update to the latest version of BlockTensorKit.jl as well, we recently found a rather significant performance bug there, so I'm expecting that if you run the new version the timings (even without multithreading) should have improved. |
I would be very happy and truly grateful if you are willing to do so. Multithreading acceleration would be very helpful for my current work, so if you make the multithreading branch available, I can provide timely feedback on any issues I might encounter. |
Thank you very much, I will try this now |
Hi Lukas,
I am exploring the new version of MPSKit. Compared with [email protected], the [email protected] seems to drop some support for parallel computation, especially for finite size system and algorithms.
For an example, set
MPSKit.Defaults.set_parallelization("derivatives" => true)
is useful when I performDMRG2
for a finite size lattice in the past version:But in the new version, a lot of things changed, and I noticed that the two-site derivative function
∂AC2
does not use multithreads anymore, which seems to drop the last support forDMRG2
in parallelization and that indeed seems to be the case:For single-threaded computations, the new version has significant advantages. However, since the new version does not support multi-threading for finite size DMRG, it is at a disadvantage compared to the previous version.
The text was updated successfully, but these errors were encountered: