Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-threading attempt III #203

Draft
wants to merge 18 commits into
base: master
Choose a base branch
from
Draft

Multi-threading attempt III #203

wants to merge 18 commits into from

Conversation

lkdvos
Copy link
Collaborator

@lkdvos lkdvos commented Jan 17, 2025

This is a continuation of #100 and #117 in an attempt to properly address multithreading over blocks in the various parts of the code.

To achieve this, I added:

  • backend, allocator support to the TensorOperations functions
  • backend, allocator support to the indexmanipulations
  • a TensorKitBackend that holds a scheduler and a backend to pass on
  • BlockIterator to avoid having to find the cached structures in a multithreaded loop, reducing overall cache lookups and hopefully avoiding lock contention

Before continuing and pushing this through to the other functions, some questions:

  • all introduced names are up for discussion
  • should mul! take a scheduler or a backend?
  • do we want to remove some functions that are now duplicates with slightly altered functionality? permute!, add_permute! and tensoradd! all do more or less the same thing
  • Is it fair to maybe just import the TensorOperations functions, and write everything in terms of that?
  • Various comments throughout the code itself

@lkdvos lkdvos requested a review from Jutho January 17, 2025 02:11
@lkdvos lkdvos force-pushed the ld-multithreading2 branch 2 times, most recently from 189b140 to c6f7c15 Compare January 17, 2025 02:17
@lkdvos lkdvos force-pushed the ld-multithreading2 branch from c6f7c15 to d6bf440 Compare January 17, 2025 13:30
@Jutho
Copy link
Owner

Jutho commented Jan 17, 2025

I've only browsed through this quickly so far. The design is definitely different than what I had in mind. What I had in mind was that "all" TensorKit functions would accept three final arguments (probably in this order):

  • scheduler
  • backend
  • allocator

where backend and allocator are just passed through to the relevant TensorOperations or MatrixAlgebraKit method (and could thus take its values from there), and scheduler is specific to the TensorKit method and is used in deciding how to tun over coupled charge blocks or over fusiontree subblocks.

Ultimately that is of course the same, except for the fact that now you first have to wrap the underlying TensorOperations Backend or MatrixAlgebraKit Algorithm (as it is called there) in a TensorKitBackend. That is easy to do for the defaults, but maybe a bit more cumbersome for the experiments. Also, suppose you want to specify a scheduler, but don't care about the backend. Is that easy to do? Do we need to harmonize the default backend structure between TensorOperations and MatrixAlgebraKit ?

@Jutho
Copy link
Owner

Jutho commented Jan 17, 2025

Also, as response to the question, yes also mul! is supposed to become part of MatrixAlgebraKit and potentially support different backends/algorithms (I guess Octavian is not going to happen, but there might still be cases where interesting different matrix multiplication backends exist).

@lkdvos
Copy link
Collaborator Author

lkdvos commented Jan 17, 2025

I think having some kind of wrapper is a bit inevitable, since for example in tensorcontract! I would need a mul! backend, a backend for the tensoroperations itself, a scheduler, an allocator, etc. If there is only a single backend, then mul! would alos have to support the tensoroperations backends, so that might not be the most convenient. In any case, I liked having them together as a single argument just to not have too many, but this is of course equivalent to a tuple...

The logic of using a DefaultBackend, which is equivalent to selecting a backend at runtime seems to work rather well, ie in the current implementation I pass DefaultBackend as the "array backend" to the tensoroperations functions, which then select the correct backend a bit further down the stack. I would be okay with having a similar implementation for the scheduler and allocator as well if that helps.

One thing that I am kind of becoming more and more in favour of is the idea of simply putting the allocator and scheduler (and maybe even the backend for functions other than TensorOperations) in a scoped value, instead of explicitly passing them around. Realistically, it's not actually that convenient to change the arguments of mul! or tsvd! calls, since these are typically hidden in some lower levels of the code, and passing them all the way up is a bit of a pain. For MPSKit, we would have to rewrite almost all code to pass these around, and I think this is precisely what ScopedValues should solve.

In some sense, what I would see as a balance between these things is:

  • The direct functions such as mul!, tsvd! etc have a final argument backend that can be used to control the implementation, which includes the necessary configurations: a scheduler where applicable, an allocator, and possibly some additional options for the implementation.
  • These final arguments have a default value that can be controlled via scoped values, such that it becomes easy to alter from anywhere in the callstack. This is achieved via the select_backend pattern we have in TensorOperations
  • In order to allow for "dynamically selecting a default value", DefaultBackend can be used to control subalgorithms, which will then select a backend using select_backend again.

To keep consistency, I'm also okay with having backend, allocator as final arguments, but introducing another scheduler seems like it should just be part of the backend, since it is really implementation-dependent what that should be and whether or not it is present. It would make more sense to me to have something like the following to indicate a parallelized implementation of algorithm over blocks through a scheduler.

struct BlockAlgorithm <: AbstractAlgorithm
    scheduler
    algorithm
end

In any case, it's a bit hard to reason about this without the MatrixAlgebraKit changes fully in place, but I wanted to start tackling and having an implementation to look at seems like a good way to get the conversation started :)

@Jutho
Copy link
Owner

Jutho commented Jan 17, 2025

I am definitely in favor of something that can be controlled via scoped values.

@lkdvos lkdvos force-pushed the ld-multithreading2 branch from 9a845b0 to 22739bb Compare February 19, 2025 13:05
Copy link

codecov bot commented Feb 19, 2025

Codecov Report

Attention: Patch coverage is 70.76023% with 50 lines in your changes missing coverage. Please review.

Project coverage is 77.34%. Comparing base (a4eb3f3) to head (2ce33eb).
Report is 5 commits behind head on master.

Files with missing lines Patch % Lines
src/tensors/braidingtensor.jl 37.50% 15 Missing ⚠️
src/tensors/backends.jl 22.22% 14 Missing ⚠️
src/tensors/linalg.jl 58.06% 13 Missing ⚠️
src/tensors/indexmanipulations.jl 77.77% 8 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #203      +/-   ##
==========================================
- Coverage   82.51%   77.34%   -5.17%     
==========================================
  Files          43       44       +1     
  Lines        5552     5620      +68     
==========================================
- Hits         4581     4347     -234     
- Misses        971     1273     +302     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Owner

@Jutho Jutho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A first set of comments; I still have to go through src/tensors/tensoroperations.jl, but this looks very promising!

@ZongYongyue
Copy link

ZongYongyue commented Feb 20, 2025

Some feedback:
1.Using TensorKit.set_blockscheduler!(:dynamic) throws an error:

ERROR: LoadError: MethodError: no method matching setindex!(::ScopedValue{Scheduler}, ::DynamicScheduler{OhMyThreads.Schedulers.FixedCount, ChunkSplitters.Consecutive})
The function `setindex!` exists, but no method is defined for this combination of argument types.

2.When TensorKit.TensorKitBackend().blockscheduler and TensorKit.TensorKitBackend().subblockscheduler are set to SerialScheduler, threaded_mul! is called, and an error occurs when indexing bAs and bBs at mul!(bC, bAs[c], bBs[c], α, β)

ERROR: LoadError: MethodError: no method matching getindex(::Base.Generator{Vector{ProductSector{Tuple{FermionParity, U1Irrep, U1Irrep}}}, TensorKit.var"#157#158"{BraidingTensor{Float64, GradedSpace{ProductSector{Tuple{FermionParity, U1Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, U1Irrep, U1Irrep}}, Int64}}}}}, ::ProductSector{Tuple{FermionParity, U1Irrep, U1Irrep}})

3.When TensorKit.TensorKitBackend().blockscheduler and TensorKit.TensorKitBackend().subblockscheduler are set to DynamicScheduler, an error occurs at tforeach(bCs; scheduler) do (c, bC):

ERROR: LoadError: ArgumentError: Arguments of type TensorKit.BlockIterator{TensorMap{Float64, GradedSpace{ProductSector{Tuple{FermionParity, U1Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, U1Irrep, U1Irrep}}, Int64}}, 2, 1, Vector{Float64}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, U1Irrep, U1Irrep}}, Tuple{Tuple{Int64, Int64}, UnitRange{Int64}}}} are not compatible with chunks, either implement a custom chunks method for your type, or implement the custom type interface (see https://juliafolds2.github.io/ChunkSplitters.jl/dev/)

@ZongYongyue
Copy link

In the second case, I changed bAs and bBs to dicts, and the program ran successfully. I then simulated a Hubbard model of size 2×5 with D=512 and found that the speed was faster than both the current master version and the old version.

@ZongYongyue
Copy link

add! throws an error now:

Warning: The function `add!` is not implemented for (values of) type `Tuple{Base.ReshapedArray{Float64, 2, SubArray{Float64, 1, Vector{Float64}, Tuple{UnitRange{Int64}}, true}, Tuple{}}, Float64, VectorInterface.One, VectorInterface.One}`;
│ this fallback will disappear in future versions of VectorInterface.jl
└ @ VectorInterface ~/.julia/packages/VectorInterface/J6qCR/src/fallbacks.jl:143
ERROR: LoadError: ArgumentError: No fallback for applying `add!` to (values of) type `Tuple{Base.ReshapedArray{Float64, 2, SubArray{Float64, 1, Vector{Float64}, Tuple{UnitRange{Int64}}, true}, Tuple{}}, Float64, VectorInterface.One, VectorInterface.One}` could be determined
Stacktrace:
  [1] add!(y::Base.ReshapedArray{Float64, 2, SubArray{Float64, 1, Vector{Float64}, Tuple{UnitRange{Int64}}, true}, Tuple{}}, x::Float64, α::VectorInterface.One, β::VectorInterface.One)
    @ VectorInterface ~/.julia/packages/VectorInterface/J6qCR/src/fallbacks.jl:150
  [2] add!(ty::TensorMap{Float64, GradedSpace{ProductSector{Tuple{FermionParity, U1Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, U1Irrep, U1Irrep}}, Int64}}, 2, 2, Vector{Float64}}, tx::BraidingTensor{Float64, GradedSpace{ProductSector{Tuple{FermionParity, U1Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, U1Irrep, U1Irrep}}, Int64}}}, α::VectorInterface.One, β::VectorInterface.One)
    @ TensorKit ~/Library/Mobile Documents/com~apple~CloudDocs/Clone/Jutho/TensorKit.jl-ld-multithreading2/src/tensors/vectorinterface.jl:77
  [3] add!
    @ ~/.julia/packages/VectorInterface/J6qCR/src/interface.jl:124 [inlined]
  [4] add(ty::TensorMap{Float64, GradedSpace{ProductSector{Tuple{FermionParity, U1Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, U1Irrep, U1Irrep}}, Int64}}, 2, 2, Vector{Float64}}, tx::BraidingTensor{Float64, GradedSpace{ProductSector{Tuple{FermionParity, U1Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, U1Irrep, U1Irrep}}, Int64}}}, α::VectorInterface.One, β::VectorInterface.One)
    @ TensorKit ~/Library/Mobile Documents/com~apple~CloudDocs/Clone/Jutho/TensorKit.jl-ld-multithreading2/src/tensors/vectorinterface.jl:71
  [5] add
    @ ~/.julia/packages/VectorInterface/J6qCR/src/interface.jl:107 [inlined]
  [6] +(t1::TensorMap{Float64, GradedSpace{ProductSector{Tuple{FermionParity, U1Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, U1Irrep, U1Irrep}}, Int64}}, 2, 2, Vector{Float64}}, t2::BraidingTensor{Float64, GradedSpace{ProductSector{Tuple{FermionParity, U1Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, U1Irrep, U1Irrep}}, Int64}}})
    @ TensorKit ~/Library/Mobile Documents/com~apple~CloudDocs/Clone/Jutho/TensorKit.jl-ld-multithreading2/src/tensors/linalg.jl:7

@lkdvos
Copy link
Collaborator Author

lkdvos commented Feb 20, 2025

Thanks for the feedback!
For 1. I just forgot how ScopedValues work, and you can't actually change the default like that. I simply removed those methods
For 2. this was a problem with the blockiterator of BraidingTensor, which I had forgotten about. This should now be resolved. (I also added in some specializations, which might even avoid ever getting to that part of the code to begin with) (EDIT: hopefully fixed now?)

For 3. this is an interesting thing we might want to work around or with: the collections that OhMyThreads can handle need to have an implementation for ChunkSplitters.jl. The easiest way to accomplish this is to simply use blocksectors instead, since that avoids many of the other issues.
The alternative is to actually try and support ChunkSplitters on the blockiterators directly, but that requires us to either implement firstindex, lastindex length and view on them, or directly write the implementations for chunks and index_chunks. I think for now, simply using blocksectors is the easiest way forwards, because otherwise we might have to interpret BlockIterator <: AbstractVector, and I'm not sure that's something we should do.

@ZongYongyue
Copy link

Thank you very much for your prompt fix -- it has resolved all the bugs I have found.

@ZongYongyue
Copy link

Possible bug? In multithreading, using SU(2) symmetry causes @planar to throw an error, but using only U(1) symmetry does not:

TensorKit.with_blockscheduler(DynamicScheduler()) do
    TensorKit.with_subblockscheduler(DynamicScheduler()) do
        E = e_plus(Float64, SU2Irrep, U1Irrep; side=:L, filling=filling)'
        F = isomorphism(storagetype(E), flip(space(E, 2)), space(E, 2))
        @planar e⁻[-1; -2 -3] := E[-1 1; -2] * F[-3; 1]
    end
end

ERROR: LoadError: ArgumentError: Arguments of type Base.Iterators.ProductIterator{Tuple{Base.Iterators.ProductIterator{Tuple{TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}, TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}}}, Base.Iterators.ProductIterator{Tuple{TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}}}}} are not compatible with chunks, either implement a custom chunks method for your type, or implement the custom type interface (see https://juliafolds2.github.io/ChunkSplitters.jl/dev/)
Stacktrace:

TensorKit.with_blockscheduler(DynamicScheduler()) do
    TensorKit.with_subblockscheduler(DynamicScheduler()) do
        E = e_plus(Float64, U1Irrep, U1Irrep; side=:L, filling=filling)'
        F = isomorphism(storagetype(E), flip(space(E, 2)), space(E, 2))
        @planar e⁻[-1; -2 -3] := E[-1 1; -2] * F[-3; 1]
        println(e⁻)
    end
end

TensorMap(Vect[(FermionParity  Irrep[U₁]  Irrep[U₁])]((0, 0, 1)=>1, (0, 0, -1)=>1, (1, 1, 0)=>1, (1, -1, 0)=>1)  (Vect[(FermionParity  Irrep[U₁]  Irrep[U₁])]((0, 0, 1)=>1, (0, 0, -1)=>1, (1, 1, 0)=>1, (1, -1, 0)=>1)  Vect[(FermionParity  Irrep[U₁]  Irrep[U₁])]((1, -1, -1)=>1))):
* Data for sector ((FermionParity(0)  Irrep[U₁](0)  Irrep[U₁](-1)),)  ((FermionParity(1)  Irrep[U₁](1)  Irrep[U₁](0)), (FermionParity(1)  Irrep[U₁](-1)  Irrep[U₁](-1))):
[:, :, 1] =
 1.0
* Data for sector ((FermionParity(1)  Irrep[U₁](-1)  Irrep[U₁](0)),)  ((FermionParity(0)  Irrep[U₁](0)  Irrep[U₁](1)), (FermionParity(1)  Irrep[U₁](-1)  Irrep[U₁](-1))):
[:, :, 1] =
 -1.0

@lkdvos
Copy link
Collaborator Author

lkdvos commented Feb 24, 2025

Could you also attach what e_plus does so I can reproduce the error?

@ZongYongyue
Copy link

eplus is the same creation operator you originally had in MPSKitModels, except that I added a filling parameter. Setting filling = (1,1) is fine.

function e_plus(elt::Type{<:Number}, ::Type{SU2Irrep}, ::Type{U1Irrep}; side=:L, filling=filling)
    I = FermionParity  SU2Irrep  U1Irrep
    P, Q = filling
    pspace = Vect[I]((0,0,-P)=>1, (1,1//2,Q-P)=>1, (0,0,2*Q-P)=>1)
    vspace = Vect[I]((1,1//2,Q)=>1)
    if side == :L
        e⁺ = TensorMap(zeros, elt, pspace  pspace  vspace)
        block(e⁺, I(0,0,2*Q-P)) .= sqrt(2)
        block(e⁺, I(1,1//2,Q-P)) .= 1
    elseif side == :R
        E = e_plus(elt, SU2Irrep, U1Irrep; side=:L, filling=filling)
        F = isomorphism(storagetype(E), vspace, flip(vspace))
        @planar e⁺[-1 -2; -3] := E[-2; 1 2] * τ[1 2; 3 -3] * F[3; -1]
    end
    return e⁺
end

@ZongYongyue
Copy link

The bug occurs when e_min is created, this is the full error info:

ERROR: LoadError: ArgumentError: Arguments of type Base.Iterators.ProductIterator{Tuple{Base.Iterators.ProductIterator{Tuple{TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}, TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}}}, Base.Iterators.ProductIterator{Tuple{TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}}}}} are not compatible with chunks, either implement a custom chunks method for your type, or implement the custom type interface (see https://juliafolds2.github.io/ChunkSplitters.jl/dev/)
Stacktrace:
  [1] err_not_chunkable(::Base.Iterators.ProductIterator{Tuple{Base.Iterators.ProductIterator{Tuple{TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}, TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}}}, Base.Iterators.ProductIterator{Tuple{TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}}}}})
    @ ChunkSplitters.Internals ~/.julia/packages/ChunkSplitters/p2yrz/src/internals.jl:91
  [2] ChunkSplitters.Internals.IndexChunks(s::ChunkSplitters.Consecutive; collection::Base.Iterators.ProductIterator{Tuple{Base.Iterators.ProductIterator{Tuple{TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}, TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}}}, Base.Iterators.ProductIterator{Tuple{TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}}}}}, n::Int64, size::Nothing, minsize::Nothing)
    @ ChunkSplitters.Internals ~/.julia/packages/ChunkSplitters/p2yrz/src/internals.jl:33
  [3] index_chunks(collection::Base.Iterators.ProductIterator{Tuple{Base.Iterators.ProductIterator{Tuple{TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}, TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}}}, Base.Iterators.ProductIterator{Tuple{TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}}}}}; n::Int64, size::Nothing, split::ChunkSplitters.Consecutive, minsize::Nothing)
    @ ChunkSplitters.Internals ~/.julia/packages/ChunkSplitters/p2yrz/src/internals.jl:47
  [4] _index_chunks(sched::DynamicScheduler{OhMyThreads.Schedulers.FixedCount, ChunkSplitters.Consecutive}, arg::Base.Iterators.ProductIterator{Tuple{Base.Iterators.ProductIterator{Tuple{TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}, TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}}}, Base.Iterators.ProductIterator{Tuple{TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}}}}})
    @ OhMyThreads.Implementation ~/.julia/packages/OhMyThreads/eiaNP/src/implementation.jl:27
  [5] _tmapreduce(f::Function, op::Function, Arrs::Tuple{Base.Iterators.ProductIterator{Tuple{Base.Iterators.ProductIterator{Tuple{TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}, TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}}}, Base.Iterators.ProductIterator{Tuple{TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}}}}}}, ::Type{Nothing}, scheduler::DynamicScheduler{OhMyThreads.Schedulers.FixedCount, ChunkSplitters.Consecutive}, mapreduce_kwargs::@NamedTuple{init::Nothing})
    @ OhMyThreads.Implementation ~/.julia/packages/OhMyThreads/eiaNP/src/implementation.jl:106
  [6] #tmapreduce#22
    @ ~/.julia/packages/OhMyThreads/eiaNP/src/implementation.jl:85 [inlined]
  [7] tmapreduce
    @ ~/.julia/packages/OhMyThreads/eiaNP/src/implementation.jl:69 [inlined]
  [8] tforeach(f::Function, A::Base.Iterators.ProductIterator{Tuple{Base.Iterators.ProductIterator{Tuple{TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}, TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}}}, Base.Iterators.ProductIterator{Tuple{TensorKitSectors.SectorSet{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.var"#93#94"{GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}}, Vector{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}}}}}}}; kwargs::@Kwargs{scheduler::DynamicScheduler{OhMyThreads.Schedulers.FixedCount, ChunkSplitters.Consecutive}})
    @ OhMyThreads.Implementation ~/.julia/packages/OhMyThreads/eiaNP/src/implementation.jl:308
  [9] tforeach
    @ ~/.julia/packages/OhMyThreads/eiaNP/src/implementation.jl:307 [inlined]
 [10] _add_general_kernel!
    @ ~/Library/Mobile Documents/com~apple~CloudDocs/Clone/Jutho/TensorKit.jl-ld-multithreading2/src/tensors/indexmanipulations.jl:631 [inlined]
 [11] add_transform_kernel!
    @ ~/Library/Mobile Documents/com~apple~CloudDocs/Clone/Jutho/TensorKit.jl-ld-multithreading2/src/tensors/indexmanipulations.jl:585 [inlined]
 [12] add_transform!(tdst::TensorMap{Float64, GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}, 2, 1, Vector{Float64}}, tsrc::TensorKit.AdjointTensorMap{Float64, GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}, 2, 1, TensorMap{Float64, GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}, 1, 2, Vector{Float64}}}, ::Tuple{Tuple{Int64, Int64}, Tuple{Int64}}, transformer::Function, α::VectorInterface.One, β::VectorInterface.Zero, backend::TensorKit.TensorKitBackend{TensorOperations.DefaultBackend, DynamicScheduler{OhMyThreads.Schedulers.FixedCount, ChunkSplitters.Consecutive}, DynamicScheduler{OhMyThreads.Schedulers.FixedCount, ChunkSplitters.Consecutive}}, allocator::TensorOperations.DefaultAllocator)
    @ TensorKit ~/Library/Mobile Documents/com~apple~CloudDocs/Clone/Jutho/TensorKit.jl-ld-multithreading2/src/tensors/indexmanipulations.jl:490
 [13] add_transform!(C::TensorMap{Float64, GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}, 2, 1, Vector{Float64}}, A::TensorKit.AdjointTensorMap{Float64, GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}, 2, 1, TensorMap{Float64, GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}, 1, 2, Vector{Float64}}}, pA::Tuple{Tuple{Int64, Int64}, Tuple{Int64}}, transformer::Function, α::VectorInterface.One, β::VectorInterface.Zero, backend::TensorOperations.DefaultBackend, allocator::TensorOperations.DefaultAllocator)
    @ TensorKit ~/Library/Mobile Documents/com~apple~CloudDocs/Clone/Jutho/TensorKit.jl-ld-multithreading2/src/tensors/indexmanipulations.jl:462
 [14] add_transform!
    @ ~/Library/Mobile Documents/com~apple~CloudDocs/Clone/Jutho/TensorKit.jl-ld-multithreading2/src/tensors/indexmanipulations.jl:456 [inlined]
 [15] add_transpose!
    @ ~/Library/Mobile Documents/com~apple~CloudDocs/Clone/Jutho/TensorKit.jl-ld-multithreading2/src/tensors/indexmanipulations.jl:439 [inlined]
 [16] planarcontract!(C::TensorMap{Float64, GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}, 2, 1, Vector{Float64}}, A::TensorKit.AdjointTensorMap{Float64, GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}, 2, 1, TensorMap{Float64, GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}, 1, 2, Vector{Float64}}}, pA::Tuple{Tuple{Int64, Int64}, Tuple{Int64}}, B::TensorMap{Float64, GradedSpace{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, TensorKit.SortedVectorDict{ProductSector{Tuple{FermionParity, SU2Irrep, U1Irrep}}, Int64}}, 1, 1, Vector{Float64}}, pB::Tuple{Tuple{Int64}, Tuple{Int64}}, pAB::Tuple{Tuple{Int64, Int64}, Tuple{Int64}}, α::VectorInterface.One, β::VectorInterface.Zero, backend::TensorOperations.DefaultBackend, allocator::TensorOperations.DefaultAllocator)
    @ TensorKit ~/Library/Mobile Documents/com~apple~CloudDocs/Clone/Jutho/TensorKit.jl-ld-multithreading2/src/planar/planaroperations.jl:161
 [17] planarcontract!
    @ ~/Library/Mobile Documents/com~apple~CloudDocs/Clone/Jutho/TensorKit.jl-ld-multithreading2/src/planar/planaroperations.jl:115 [inlined]
 [18] planarcontract!
    @ ~/Library/Mobile Documents/com~apple~CloudDocs/Clone/Jutho/TensorKit.jl-ld-multithreading2/src/planar/planaroperations.jl:110 [inlined]
 [19] e_min(elt::Type{Float64}, particle_symmetry::Type{SU2Irrep}, spin_symmetry::Type{U1Irrep}; side::Symbol, filling::Tuple{Int64, Int64})
    @ DynamicalCorrelators ~/Library/Mobile Documents/com~apple~CloudDocs/mygit/DynamicalCorrelators.jl/src/operators/fermions.jl:239
 [20] (::var"#2#4")()
    @ Main ~/Library/Mobile Documents/com~apple~CloudDocs/mygit/projects/000_test/tdvp/OhMyTh/LNO.jl:243
 [21] #with_subblockscheduler#162
    @ ~/Library/Mobile Documents/com~apple~CloudDocs/Clone/Jutho/TensorKit.jl-ld-multithreading2/src/tensors/backends.jl:54 [inlined]
 [22] with_subblockscheduler
    @ ~/Library/Mobile Documents/com~apple~CloudDocs/Clone/Jutho/TensorKit.jl-ld-multithreading2/src/tensors/backends.jl:52 [inlined]
 [23] (::var"#1#3")()
    @ Main ~/Library/Mobile Documents/com~apple~CloudDocs/mygit/projects/000_test/tdvp/OhMyTh/LNO.jl:238
 [24] with_blockscheduler(f::var"#1#3", scheduler::DynamicScheduler{OhMyThreads.Schedulers.FixedCount, ChunkSplitters.Consecutive}; kwargs::@Kwargs{})
    @ TensorKit ~/Library/Mobile Documents/com~apple~CloudDocs/Clone/Jutho/TensorKit.jl-ld-multithreading2/src/tensors/backends.jl:39
 [25] with_blockscheduler(f::Function, scheduler::DynamicScheduler{OhMyThreads.Schedulers.FixedCount, ChunkSplitters.Consecutive})
    @ TensorKit ~/Library/Mobile Documents/com~apple~CloudDocs/Clone/Jutho/TensorKit.jl-ld-multithreading2/src/tensors/backends.jl:37
 [26] top-level scope
    @ ~/Library/Mobile Documents/com~apple~CloudDocs/mygit/projects/000_test/tdvp/OhMyTh/LNO.jl:237
in expression starting at /Users/zongyy/Library/Mobile Documents/com~apple~CloudDocs/mygit/projects/000_test/tdvp/OhMyTh/LNO.jl:237

@lkdvos
Copy link
Collaborator Author

lkdvos commented Feb 24, 2025

Should now be resolved. Interesting to note here is that this occurs when permuting an AdjointTensorMap, which is still not taking the "fast implementation" codepath. A profiler might point out if this is worth specializing as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants