Multi-threading attempt III #203

lkdvos · 2025-01-17T02:11:31Z

This is a continuation of #100 and #117 in an attempt to properly address multithreading over blocks in the various parts of the code.

To achieve this, I added:

backend, allocator support to the TensorOperations functions
backend, allocator support to the indexmanipulations
a TensorKitBackend that holds a scheduler and a backend to pass on
BlockIterator to avoid having to find the cached structures in a multithreaded loop, reducing overall cache lookups and hopefully avoiding lock contention

Before continuing and pushing this through to the other functions, some questions:

all introduced names are up for discussion
should mul! take a scheduler or a backend?
do we want to remove some functions that are now duplicates with slightly altered functionality? permute!, add_permute! and tensoradd! all do more or less the same thing
Is it fair to maybe just import the TensorOperations functions, and write everything in terms of that?
Various comments throughout the code itself

src/tensors/tensor.jl

src/tensors/tensoroperations.jl

Jutho · 2025-01-17T13:49:48Z

I've only browsed through this quickly so far. The design is definitely different than what I had in mind. What I had in mind was that "all" TensorKit functions would accept three final arguments (probably in this order):

scheduler
backend
allocator

where backend and allocator are just passed through to the relevant TensorOperations or MatrixAlgebraKit method (and could thus take its values from there), and scheduler is specific to the TensorKit method and is used in deciding how to tun over coupled charge blocks or over fusiontree subblocks.

Ultimately that is of course the same, except for the fact that now you first have to wrap the underlying TensorOperations Backend or MatrixAlgebraKit Algorithm (as it is called there) in a TensorKitBackend. That is easy to do for the defaults, but maybe a bit more cumbersome for the experiments. Also, suppose you want to specify a scheduler, but don't care about the backend. Is that easy to do? Do we need to harmonize the default backend structure between TensorOperations and MatrixAlgebraKit ?

Jutho · 2025-01-17T13:52:18Z

Also, as response to the question, yes also mul! is supposed to become part of MatrixAlgebraKit and potentially support different backends/algorithms (I guess Octavian is not going to happen, but there might still be cases where interesting different matrix multiplication backends exist).

lkdvos · 2025-01-17T15:12:13Z

I think having some kind of wrapper is a bit inevitable, since for example in tensorcontract! I would need a mul! backend, a backend for the tensoroperations itself, a scheduler, an allocator, etc. If there is only a single backend, then mul! would alos have to support the tensoroperations backends, so that might not be the most convenient. In any case, I liked having them together as a single argument just to not have too many, but this is of course equivalent to a tuple...

The logic of using a DefaultBackend, which is equivalent to selecting a backend at runtime seems to work rather well, ie in the current implementation I pass DefaultBackend as the "array backend" to the tensoroperations functions, which then select the correct backend a bit further down the stack. I would be okay with having a similar implementation for the scheduler and allocator as well if that helps.

One thing that I am kind of becoming more and more in favour of is the idea of simply putting the allocator and scheduler (and maybe even the backend for functions other than TensorOperations) in a scoped value, instead of explicitly passing them around. Realistically, it's not actually that convenient to change the arguments of mul! or tsvd! calls, since these are typically hidden in some lower levels of the code, and passing them all the way up is a bit of a pain. For MPSKit, we would have to rewrite almost all code to pass these around, and I think this is precisely what ScopedValues should solve.

In some sense, what I would see as a balance between these things is:

The direct functions such as mul!, tsvd! etc have a final argument backend that can be used to control the implementation, which includes the necessary configurations: a scheduler where applicable, an allocator, and possibly some additional options for the implementation.
These final arguments have a default value that can be controlled via scoped values, such that it becomes easy to alter from anywhere in the callstack. This is achieved via the select_backend pattern we have in TensorOperations
In order to allow for "dynamically selecting a default value", DefaultBackend can be used to control subalgorithms, which will then select a backend using select_backend again.

To keep consistency, I'm also okay with having backend, allocator as final arguments, but introducing another scheduler seems like it should just be part of the backend, since it is really implementation-dependent what that should be and whether or not it is present. It would make more sense to me to have something like the following to indicate a parallelized implementation of algorithm over blocks through a scheduler.

struct BlockAlgorithm <: AbstractAlgorithm
    scheduler
    algorithm
end

In any case, it's a bit hard to reason about this without the MatrixAlgebraKit changes fully in place, but I wanted to start tackling and having an implementation to look at seems like a good way to get the conversation started :)

Jutho · 2025-01-17T15:25:25Z

I am definitely in favor of something that can be controlled via scoped values.

codecov · 2025-02-19T14:07:46Z

Codecov Report

Attention: Patch coverage is 70.76023% with 50 lines in your changes missing coverage. Please review.

Project coverage is 77.34%. Comparing base (a4eb3f3) to head (2ce33eb).
Report is 17 commits behind head on master.

Files with missing lines	Patch %	Lines
src/tensors/braidingtensor.jl	37.50%	15 Missing ⚠️
src/tensors/backends.jl	22.22%	14 Missing ⚠️
src/tensors/linalg.jl	58.06%	13 Missing ⚠️
src/tensors/indexmanipulations.jl	77.77%	8 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #203      +/-   ##
==========================================
- Coverage   82.51%   77.34%   -5.17%     
==========================================
  Files          43       44       +1     
  Lines        5552     5620      +68     
==========================================
- Hits         4581     4347     -234     
- Misses        971     1273     +302

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Jutho

A first set of comments; I still have to go through src/tensors/tensoroperations.jl, but this looks very promising!

src/tensors/backends.jl

src/tensors/indexmanipulations.jl

src/tensors/linalg.jl

ZongYongyue · 2025-02-20T10:10:47Z

Some feedback:
1.Using TensorKit.set_blockscheduler!(:dynamic) throws an error:

ERROR: LoadError: MethodError: no method matching setindex!(::ScopedValue{Scheduler}, ::DynamicScheduler{OhMyThreads.Schedulers.FixedCount, ChunkSplitters.Consecutive})
The function `setindex!` exists, but no method is defined for this combination of argument types.

2.When TensorKit.TensorKitBackend().blockscheduler and TensorKit.TensorKitBackend().subblockscheduler are set to SerialScheduler, threaded_mul! is called, and an error occurs when indexing bAs and bBs at mul!(bC, bAs[c], bBs[c], α, β)

ERROR: LoadError: MethodError: no method matching getindex(::Base.Generator{Vector{ProductSector{Tuple{FermionParity, U1Irrep, U1Irrep}}}, TensorKit.var"#157#158"{BraidingTensor{Float64, GradedSpace{ProductSector{Tuple{FermionParity, U1Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, U1Irrep, U1Irrep}}, Int64}}}}}, ::ProductSector{Tuple{FermionParity, U1Irrep, U1Irrep}})

3.When TensorKit.TensorKitBackend().blockscheduler and TensorKit.TensorKitBackend().subblockscheduler are set to DynamicScheduler, an error occurs at tforeach(bCs; scheduler) do (c, bC):

ERROR: LoadError: ArgumentError: Arguments of type TensorKit.BlockIterator{TensorMap{Float64, GradedSpace{ProductSector{Tuple{FermionParity, U1Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, U1Irrep, U1Irrep}}, Int64}}, 2, 1, Vector{Float64}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, U1Irrep, U1Irrep}}, Tuple{Tuple{Int64, Int64}, UnitRange{Int64}}}} are not compatible with chunks, either implement a custom chunks method for your type, or implement the custom type interface (see https://juliafolds2.github.io/ChunkSplitters.jl/dev/)

ZongYongyue · 2025-02-20T10:30:04Z

In the second case, I changed bAs and bBs to dicts, and the program ran successfully. I then simulated a Hubbard model of size 2×5 with D=512 and found that the speed was faster than both the current master version and the old version.

This reverts commit 1eca7a8.

ZongYongyue · 2025-02-20T13:53:30Z

add! throws an error now:

Warning: The function `add!` is not implemented for (values of) type `Tuple{Base.ReshapedArray{Float64, 2, SubArray{Float64, 1, Vector{Float64}, Tuple{UnitRange{Int64}}, true}, Tuple{}}, Float64, VectorInterface.One, VectorInterface.One}`;
│ this fallback will disappear in future versions of VectorInterface.jl
└ @ VectorInterface ~/.julia/packages/VectorInterface/J6qCR/src/fallbacks.jl:143
ERROR: LoadError: ArgumentError: No fallback for applying `add!` to (values of) type `Tuple{Base.ReshapedArray{Float64, 2, SubArray{Float64, 1, Vector{Float64}, Tuple{UnitRange{Int64}}, true}, Tuple{}}, Float64, VectorInterface.One, VectorInterface.One}` could be determined
Stacktrace:
  [1] add!(y::Base.ReshapedArray{Float64, 2, SubArray{Float64, 1, Vector{Float64}, Tuple{UnitRange{Int64}}, true}, Tuple{}}, x::Float64, α::VectorInterface.One, β::VectorInterface.One)
    @ VectorInterface ~/.julia/packages/VectorInterface/J6qCR/src/fallbacks.jl:150
  [2] add!(ty::TensorMap{Float64, GradedSpace{ProductSector{Tuple{FermionParity, U1Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, U1Irrep, U1Irrep}}, Int64}}, 2, 2, Vector{Float64}}, tx::BraidingTensor{Float64, GradedSpace{ProductSector{Tuple{FermionParity, U1Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, U1Irrep, U1Irrep}}, Int64}}}, α::VectorInterface.One, β::VectorInterface.One)
    @ TensorKit ~/Library/Mobile Documents/com~apple~CloudDocs/Clone/Jutho/TensorKit.jl-ld-multithreading2/src/tensors/vectorinterface.jl:77
  [3] add!
    @ ~/.julia/packages/VectorInterface/J6qCR/src/interface.jl:124 [inlined]
  [4] add(ty::TensorMap{Float64, GradedSpace{ProductSector{Tuple{FermionParity, U1Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, U1Irrep, U1Irrep}}, Int64}}, 2, 2, Vector{Float64}}, tx::BraidingTensor{Float64, GradedSpace{ProductSector{Tuple{FermionParity, U1Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, U1Irrep, U1Irrep}}, Int64}}}, α::VectorInterface.One, β::VectorInterface.One)
    @ TensorKit ~/Library/Mobile Documents/com~apple~CloudDocs/Clone/Jutho/TensorKit.jl-ld-multithreading2/src/tensors/vectorinterface.jl:71
  [5] add
    @ ~/.julia/packages/VectorInterface/J6qCR/src/interface.jl:107 [inlined]
  [6] +(t1::TensorMap{Float64, GradedSpace{ProductSector{Tuple{FermionParity, U1Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, U1Irrep, U1Irrep}}, Int64}}, 2, 2, Vector{Float64}}, t2::BraidingTensor{Float64, GradedSpace{ProductSector{Tuple{FermionParity, U1Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, U1Irrep, U1Irrep}}, Int64}}})
    @ TensorKit ~/Library/Mobile Documents/com~apple~CloudDocs/Clone/Jutho/TensorKit.jl-ld-multithreading2/src/tensors/linalg.jl:7

lkdvos · 2025-02-20T13:58:49Z

Thanks for the feedback!
For 1. I just forgot how ScopedValues work, and you can't actually change the default like that. I simply removed those methods
For 2. this was a problem with the blockiterator of BraidingTensor, which I had forgotten about. This should now be resolved. (I also added in some specializations, which might even avoid ever getting to that part of the code to begin with) (EDIT: hopefully fixed now?)

For 3. this is an interesting thing we might want to work around or with: the collections that OhMyThreads can handle need to have an implementation for ChunkSplitters.jl. The easiest way to accomplish this is to simply use blocksectors instead, since that avoids many of the other issues.
The alternative is to actually try and support ChunkSplitters on the blockiterators directly, but that requires us to either implement firstindex, lastindex length and view on them, or directly write the implementations for chunks and index_chunks. I think for now, simply using blocksectors is the easiest way forwards, because otherwise we might have to interpret BlockIterator <: AbstractVector, and I'm not sure that's something we should do.

ZongYongyue · 2025-02-20T14:08:23Z

Thank you very much for your prompt fix -- it has resolved all the bugs I have found.

ZongYongyue · 2025-02-24T14:24:45Z

Possible bug? In multithreading, using SU(2) symmetry causes @planar to throw an error, but using only U(1) symmetry does not:

TensorKit.with_blockscheduler(DynamicScheduler()) do
    TensorKit.with_subblockscheduler(DynamicScheduler()) do
        E = e_plus(Float64, SU2Irrep, U1Irrep; side=:L, filling=filling)'
        F = isomorphism(storagetype(E), flip(space(E, 2)), space(E, 2))
        @planar e⁻[-1; -2 -3] := E[-1 1; -2] * F[-3; 1]
    end
end

ERROR: LoadError: ArgumentError: Arguments of type Base.Iterators.ProductIterator{Tuple{Base.Iterators.ProductIterator{Tuple{TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}, TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}}}, Base.Iterators.ProductIterator{Tuple{TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}}}}} are not compatible with chunks, either implement a custom chunks method for your type, or implement the custom type interface (see https://juliafolds2.github.io/ChunkSplitters.jl/dev/)
Stacktrace:

TensorKit.with_blockscheduler(DynamicScheduler()) do
    TensorKit.with_subblockscheduler(DynamicScheduler()) do
        E = e_plus(Float64, U1Irrep, U1Irrep; side=:L, filling=filling)'
        F = isomorphism(storagetype(E), flip(space(E, 2)), space(E, 2))
        @planar e⁻[-1; -2 -3] := E[-1 1; -2] * F[-3; 1]
        println(e⁻)
    end
end

TensorMap(Vect[(FermionParity ⊠ Irrep[U₁] ⊠ Irrep[U₁])]((0, 0, 1)=>1, (0, 0, -1)=>1, (1, 1, 0)=>1, (1, -1, 0)=>1) ← (Vect[(FermionParity ⊠ Irrep[U₁] ⊠ Irrep[U₁])]((0, 0, 1)=>1, (0, 0, -1)=>1, (1, 1, 0)=>1, (1, -1, 0)=>1) ⊗ Vect[(FermionParity ⊠ Irrep[U₁] ⊠ Irrep[U₁])]((1, -1, -1)=>1))):
* Data for sector ((FermionParity(0) ⊠ Irrep[U₁](0) ⊠ Irrep[U₁](-1)),) ← ((FermionParity(1) ⊠ Irrep[U₁](1) ⊠ Irrep[U₁](0)), (FermionParity(1) ⊠ Irrep[U₁](-1) ⊠ Irrep[U₁](-1))):
[:, :, 1] =
 1.0
* Data for sector ((FermionParity(1) ⊠ Irrep[U₁](-1) ⊠ Irrep[U₁](0)),) ← ((FermionParity(0) ⊠ Irrep[U₁](0) ⊠ Irrep[U₁](1)), (FermionParity(1) ⊠ Irrep[U₁](-1) ⊠ Irrep[U₁](-1))):
[:, :, 1] =
 -1.0

lkdvos · 2025-02-24T14:58:39Z

Could you also attach what e_plus does so I can reproduce the error?

ZongYongyue · 2025-02-24T15:12:26Z

eplus is the same creation operator you originally had in MPSKitModels, except that I added a filling parameter. Setting filling = (1,1) is fine.

function e_plus(elt::Type{<:Number}, ::Type{SU2Irrep}, ::Type{U1Irrep}; side=:L, filling=filling)
    I = FermionParity ⊠ SU2Irrep ⊠ U1Irrep
    P, Q = filling
    pspace = Vect[I]((0,0,-P)=>1, (1,1//2,Q-P)=>1, (0,0,2*Q-P)=>1)
    vspace = Vect[I]((1,1//2,Q)=>1)
    if side == :L
        e⁺ = TensorMap(zeros, elt, pspace ← pspace ⊗ vspace)
        block(e⁺, I(0,0,2*Q-P)) .= sqrt(2)
        block(e⁺, I(1,1//2,Q-P)) .= 1
    elseif side == :R
        E = e_plus(elt, SU2Irrep, U1Irrep; side=:L, filling=filling)
        F = isomorphism(storagetype(E), vspace, flip(vspace))
        @planar e⁺[-1 -2; -3] := E[-2; 1 2] * τ[1 2; 3 -3] * F[3; -1]
    end
    return e⁺
end

ZongYongyue · 2025-02-24T15:18:07Z

The bug occurs when e_min is created, this is the full error info:

ERROR: LoadError: ArgumentError: Arguments of type Base.Iterators.ProductIterator{Tuple{Base.Iterators.ProductIterator{Tuple{TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}, TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}}}, Base.Iterators.ProductIterator{Tuple{TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}}}}} are not compatible with chunks, either implement a custom chunks method for your type, or implement the custom type interface (see https://juliafolds2.github.io/ChunkSplitters.jl/dev/)
Stacktrace:
  [1] err_not_chunkable(::Base.Iterators.ProductIterator{Tuple{Base.Iterators.ProductIterator{Tuple{TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}, TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}}}, Base.Iterators.ProductIterator{Tuple{TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}}}}})
    @ ChunkSplitters.Internals ~/.julia/packages/ChunkSplitters/p2yrz/src/internals.jl:91
  [2] ChunkSplitters.Internals.IndexChunks(s::ChunkSplitters.Consecutive; collection::Base.Iterators.ProductIterator{Tuple{Base.Iterators.ProductIterator{Tuple{TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}, TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}}}, Base.Iterators.ProductIterator{Tuple{TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}}}}}, n::Int64, size::Nothing, minsize::Nothing)
    @ ChunkSplitters.Internals ~/.julia/packages/ChunkSplitters/p2yrz/src/internals.jl:33
  [3] index_chunks(collection::Base.Iterators.ProductIterator{Tuple{Base.Iterators.ProductIterator{Tuple{TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}, TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}}}, Base.Iterators.ProductIterator{Tuple{TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}}}}}; n::Int64, size::Nothing, split::ChunkSplitters.Consecutive, minsize::Nothing)
    @ ChunkSplitters.Internals ~/.julia/packages/ChunkSplitters/p2yrz/src/internals.jl:47
  [4] _index_chunks(sched::DynamicScheduler{OhMyThreads.Schedulers.FixedCount, ChunkSplitters.Consecutive}, arg::Base.Iterators.ProductIterator{Tuple{Base.Iterators.ProductIterator{Tuple{TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}, TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}}}, Base.Iterators.ProductIterator{Tuple{TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}}}}})
    @ OhMyThreads.Implementation ~/.julia/packages/OhMyThreads/eiaNP/src/implementation.jl:27
  [5] _tmapreduce(f::Function, op::Function, Arrs::Tuple{Base.Iterators.ProductIterator{Tuple{Base.Iterators.ProductIterator{Tuple{TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}, TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}}}, Base.Iterators.ProductIterator{Tuple{TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}}}}}}, ::Type{Nothing}, scheduler::DynamicScheduler{OhMyThreads.Schedulers.FixedCount, ChunkSplitters.Consecutive}, mapreduce_kwargs::@NamedTuple{init::Nothing})
    @ OhMyThreads.Implementation ~/.julia/packages/OhMyThreads/eiaNP/src/implementation.jl:106
  [6] #tmapreduce#22
    @ ~/.julia/packages/OhMyThreads/eiaNP/src/implementation.jl:85 [inlined]
  [7] tmapreduce
    @ ~/.julia/packages/OhMyThreads/eiaNP/src/implementation.jl:69 [inlined]
  [8] tforeach(f::Function, A::Base.Iterators.ProductIterator{Tuple{Base.Iterators.ProductIterator{Tuple{TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}, TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}}}, Base.Iterators.ProductIterator{Tuple{TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}}}}}; kwargs::@Kwargs{scheduler::DynamicScheduler{OhMyThreads.Schedulers.FixedCount, ChunkSplitters.Consecutive}})
    @ OhMyThreads.Implementation ~/.julia/packages/OhMyThreads/eiaNP/src/implementation.jl:308
  [9] tforeach
    @ ~/.julia/packages/OhMyThreads/eiaNP/src/implementation.jl:307 [inlined]
 [10] _add_general_kernel!
    @ ~/Library/Mobile Documents/com~apple~CloudDocs/Clone/Jutho/TensorKit.jl-ld-multithreading2/src/tensors/indexmanipulations.jl:631 [inlined]
 [11] add_transform_kernel!
    @ ~/Library/Mobile Documents/com~apple~CloudDocs/Clone/Jutho/TensorKit.jl-ld-multithreading2/src/tensors/indexmanipulations.jl:585 [inlined]
 [12] add_transform!(tdst::TensorMap{Float64, GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}, 2, 1, Vector{Float64}}, tsrc::TensorKit.AdjointTensorMap{Float64, GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}, 2, 1, TensorMap{Float64, GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}, 1, 2, Vector{Float64}}}, ::Tuple{Tuple{Int64, Int64}, Tuple{Int64}}, transformer::Function, α::VectorInterface.One, β::VectorInterface.Zero, backend::TensorKit.TensorKitBackend{TensorOperations.DefaultBackend, DynamicScheduler{OhMyThreads.Schedulers.FixedCount, ChunkSplitters.Consecutive}, DynamicScheduler{OhMyThreads.Schedulers.FixedCount, ChunkSplitters.Consecutive}}, allocator::TensorOperations.DefaultAllocator)
    @ TensorKit ~/Library/Mobile Documents/com~apple~CloudDocs/Clone/Jutho/TensorKit.jl-ld-multithreading2/src/tensors/indexmanipulations.jl:490
 [13] add_transform!(C::TensorMap{Float64, GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}, 2, 1, Vector{Float64}}, A::TensorKit.AdjointTensorMap{Float64, GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}, 2, 1, TensorMap{Float64, GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}, 1, 2, Vector{Float64}}}, pA::Tuple{Tuple{Int64, Int64}, Tuple{Int64}}, transformer::Function, α::VectorInterface.One, β::VectorInterface.Zero, backend::TensorOperations.DefaultBackend, allocator::TensorOperations.DefaultAllocator)
    @ TensorKit ~/Library/Mobile Documents/com~apple~CloudDocs/Clone/Jutho/TensorKit.jl-ld-multithreading2/src/tensors/indexmanipulations.jl:462
 [14] add_transform!
    @ ~/Library/Mobile Documents/com~apple~CloudDocs/Clone/Jutho/TensorKit.jl-ld-multithreading2/src/tensors/indexmanipulations.jl:456 [inlined]
 [15] add_transpose!
    @ ~/Library/Mobile Documents/com~apple~CloudDocs/Clone/Jutho/TensorKit.jl-ld-multithreading2/src/tensors/indexmanipulations.jl:439 [inlined]
 [16] planarcontract!(C::TensorMap{Float64, GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}, 2, 1, Vector{Float64}}, A::TensorKit.AdjointTensorMap{Float64, GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}, 2, 1, TensorMap{Float64, GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}, 1, 2, Vector{Float64}}}, pA::Tuple{Tuple{Int64, Int64}, Tuple{Int64}}, B::TensorMap{Float64, GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}, 1, 1, Vector{Float64}}, pB::Tuple{Tuple{Int64}, Tuple{Int64}}, pAB::Tuple{Tuple{Int64, Int64}, Tuple{Int64}}, α::VectorInterface.One, β::VectorInterface.Zero, backend::TensorOperations.DefaultBackend, allocator::TensorOperations.DefaultAllocator)
    @ TensorKit ~/Library/Mobile Documents/com~apple~CloudDocs/Clone/Jutho/TensorKit.jl-ld-multithreading2/src/planar/planaroperations.jl:161
 [17] planarcontract!
    @ ~/Library/Mobile Documents/com~apple~CloudDocs/Clone/Jutho/TensorKit.jl-ld-multithreading2/src/planar/planaroperations.jl:115 [inlined]
 [18] planarcontract!
    @ ~/Library/Mobile Documents/com~apple~CloudDocs/Clone/Jutho/TensorKit.jl-ld-multithreading2/src/planar/planaroperations.jl:110 [inlined]
 [19] e_min(elt::Type{Float64}, particle_symmetry::Type{SU2Irrep}, spin_symmetry::Type{U1Irrep}; side::Symbol, filling::Tuple{Int64, Int64})
    @ DynamicalCorrelators ~/Library/Mobile Documents/com~apple~CloudDocs/mygit/DynamicalCorrelators.jl/src/operators/fermions.jl:239
 [20] (::var"#2#4")()
    @ Main ~/Library/Mobile Documents/com~apple~CloudDocs/mygit/projects/000_test/tdvp/OhMyTh/LNO.jl:243
 [21] #with_subblockscheduler#162
    @ ~/Library/Mobile Documents/com~apple~CloudDocs/Clone/Jutho/TensorKit.jl-ld-multithreading2/src/tensors/backends.jl:54 [inlined]
 [22] with_subblockscheduler
    @ ~/Library/Mobile Documents/com~apple~CloudDocs/Clone/Jutho/TensorKit.jl-ld-multithreading2/src/tensors/backends.jl:52 [inlined]
 [23] (::var"#1#3")()
    @ Main ~/Library/Mobile Documents/com~apple~CloudDocs/mygit/projects/000_test/tdvp/OhMyTh/LNO.jl:238
 [24] with_blockscheduler(f::var"#1#3", scheduler::DynamicScheduler{OhMyThreads.Schedulers.FixedCount, ChunkSplitters.Consecutive}; kwargs::@Kwargs{})
    @ TensorKit ~/Library/Mobile Documents/com~apple~CloudDocs/Clone/Jutho/TensorKit.jl-ld-multithreading2/src/tensors/backends.jl:39
 [25] with_blockscheduler(f::Function, scheduler::DynamicScheduler{OhMyThreads.Schedulers.FixedCount, ChunkSplitters.Consecutive})
    @ TensorKit ~/Library/Mobile Documents/com~apple~CloudDocs/Clone/Jutho/TensorKit.jl-ld-multithreading2/src/tensors/backends.jl:37
 [26] top-level scope
    @ ~/Library/Mobile Documents/com~apple~CloudDocs/mygit/projects/000_test/tdvp/OhMyTh/LNO.jl:237
in expression starting at /Users/zongyy/Library/Mobile Documents/com~apple~CloudDocs/mygit/projects/000_test/tdvp/OhMyTh/LNO.jl:237

lkdvos · 2025-02-24T16:09:01Z

Should now be resolved. Interesting to note here is that this occurs when permuting an AdjointTensorMap, which is still not taking the "fast implementation" codepath. A profiler might point out if this is worth specializing as well.

Confusio · 2025-04-18T14:42:09Z

Any updates on the PR? I was wondering if a working version might come out soon.

lkdvos requested a review from Jutho January 17, 2025 02:11

lkdvos force-pushed the ld-multithreading2 branch 2 times, most recently from 189b140 to c6f7c15 Compare January 17, 2025 02:17

Jutho reviewed Jan 17, 2025

View reviewed changes

src/tensors/tensor.jl Outdated Show resolved Hide resolved

Jutho reviewed Jan 17, 2025

View reviewed changes

src/tensors/tensoroperations.jl Outdated Show resolved Hide resolved

lkdvos force-pushed the ld-multithreading2 branch from c6f7c15 to d6bf440 Compare January 17, 2025 13:30

lkdvos force-pushed the ld-multithreading2 branch 2 times, most recently from bb01e1a to 9a845b0 Compare January 17, 2025 16:44

lkdvos mentioned this pull request Jan 25, 2025

Parallelization in [email protected], compared with [email protected] QuantumKitHub/MPSKit.jl#236

Open

lkdvos added 7 commits February 19, 2025 07:32

Add OhMyThreads

244628a

Add TensorKitBackend

4371650

Add backend/allocator support in add_transform!

2da9b7a

Add backend/allocator support in TensorOperations

aa1e6d1

Add scheduler support in mul!

d526053

rework default scheduler settings

cfc91a9

Add functions for controlling schedulers

22739bb

lkdvos force-pushed the ld-multithreading2 branch from 9a845b0 to 22739bb Compare February 19, 2025 13:05

fix typo, remove undefined export

c501f63

Jutho reviewed Feb 19, 2025

View reviewed changes

src/tensors/backends.jl Outdated Show resolved Hide resolved

src/tensors/indexmanipulations.jl Outdated Show resolved Hide resolved

src/tensors/linalg.jl Show resolved Hide resolved

lkdvos added 4 commits February 19, 2025 10:37

simplify mul!

1eca7a8

fix typo

9a9e708

moveselect_backend

0d042de

define function

7d880e3

lkdvos added 4 commits February 20, 2025 07:23

remove incorrect code patterns

cbc3dee

Revert "simplify mul!"

6bfe4cb

This reverts commit 1eca7a8.

Hack blockiterator for BraidingTensor

2e43793

specialize mul! for BraidingTensor

dd013fd

Fix block iterator for BraidingTensor

30527de

small bugfixes

2ce33eb

Multi-threading attempt III #203

Are you sure you want to change the base?

Multi-threading attempt III #203

Uh oh!

Conversation

lkdvos commented Jan 17, 2025

Uh oh!

Uh oh!

Uh oh!

Jutho commented Jan 17, 2025

Uh oh!

Jutho commented Jan 17, 2025

Uh oh!

lkdvos commented Jan 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Jutho commented Jan 17, 2025

Uh oh!

codecov bot commented Feb 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Jutho left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ZongYongyue commented Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ZongYongyue commented Feb 20, 2025

Uh oh!

ZongYongyue commented Feb 20, 2025

Uh oh!

lkdvos commented Feb 20, 2025

Uh oh!

ZongYongyue commented Feb 20, 2025

Uh oh!

ZongYongyue commented Feb 24, 2025

Uh oh!

lkdvos commented Feb 24, 2025

Uh oh!

ZongYongyue commented Feb 24, 2025

Uh oh!

ZongYongyue commented Feb 24, 2025

Uh oh!

lkdvos commented Feb 24, 2025

Uh oh!

Confusio commented Apr 18, 2025

Uh oh!

Uh oh!

lkdvos commented Jan 17, 2025 •

edited

Loading

codecov bot commented Feb 19, 2025 •

edited

Loading

ZongYongyue commented Feb 20, 2025 •

edited

Loading