Skip to content

Add ndims type parameter to AbstractArrayInterface #42

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jun 10, 2025

Conversation

mtfishman
Copy link
Member

@mtfishman mtfishman commented Jun 9, 2025

This adds an ndims type parameter to AbstractArrayInterface, analogous to how Broadcast.AbstractArrayStyle has the same thing.

The use case is BlockSparseArrays.jl, where I want to generalize the BlockSparseArrayInterface type to store the interface of the blocks. It's helpful to have the number of dimensions stored in the interface so the block interface can be translated to a fully formed block type in calls like similar or arraytype which take an interface and construct a prototypical array or array type.

To be more specific, right now we construct a prototypical array type from an interface with functions calls DerivableInterfaces.arraytype(DefaultArrayInterface(), Float32) == Array{Float32}. This PR continues to support that, but additionally enables doing DerivableInterfaces.arraytype(DefaultArrayInterface{2}(), Float32) == Matrix{Float32}. An alternative design could be to specify the number of dimensions as a separate argument to arraytype, i.e. DerivableInterfaces.arraytype(DefaultArrayInterface(), Val(2), Float32) == Matrix{Float32}, but I think putting that information in the interface makes sense since then it can be used in other situations, such as for dispatch.

Note that this is marked as breaking since packages like SparseArraysBase.jl, DiagonalArrays.jl, BlockSparseArrays.jl, etc. will need to account for the new ndims type parameter in the interface types they define.

Closes #7.

Copy link

codecov bot commented Jun 9, 2025

Codecov Report

Attention: Patch coverage is 78.12500% with 7 lines in your changes missing coverage. Please review.

Project coverage is 72.23%. Comparing base (fc8f02c) to head (ce0ee30).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/concatenate.jl 72.72% 3 Missing ⚠️
src/abstractarrayinterface.jl 75.00% 2 Missing ⚠️
src/defaultarrayinterface.jl 84.61% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #42      +/-   ##
==========================================
+ Coverage   71.67%   72.23%   +0.56%     
==========================================
  Files          11       11              
  Lines         353      371      +18     
==========================================
+ Hits          253      268      +15     
- Misses        100      103       +3     
Flag Coverage Δ
docs 27.91% <0.00%> (-1.44%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mtfishman mtfishman requested a review from lkdvos June 9, 2025 22:16
@lkdvos
Copy link
Contributor

lkdvos commented Jun 9, 2025

Just briefly checking this, isn't the ndims in the axes? Because it's not that uncommon to want to alter the number of dimensions, for example when you reduce over one or more of them, and then this interface wouldn't really support that. At least this is what I remember from not putting it there

@mtfishman
Copy link
Member Author

mtfishman commented Jun 10, 2025

Just briefly checking this, isn't the ndims in the axes? Because it's not that uncommon to want to alter the number of dimensions, for example when you reduce over one or more of them, and then this interface wouldn't really support that. At least this is what I remember from not putting it there

The axes aren't stored in the interface, maybe you are thinking of the Concatenated object? The only changes to the Concatenated object is that I make sure the interface object stored in Concatenated has the correct ndims when it is constructed.

@mtfishman
Copy link
Member Author

You can see an example of updates that are needed for downstream packages here: ITensor/SparseArraysBase.jl#61.

@lkdvos
Copy link
Contributor

lkdvos commented Jun 10, 2025

Sorry, I should have been more clear:

What I have in mind is that in calls to similar, the number of dimensions is determined through the axes, and not the arraytype: similar(Array{Float64}, (1:3, 1:3, ...)) seems usually more convenient than having to deal with similar(Array{Float64,N}, ...), since then you have to actually ignore the N because it is already defined through the axes, which is easy enough for Array but hard for generic types, since you don't actually know which type parameter to ignore.

I can see that it might be easier to work with fully instantiated arraytypes, but my experience is that this can become quite messy quickly, imagine having a SparseBlockArray with ndims = 3, and I call similar(a, (axes(a, 1),), if I know go through some codepath that adds the ndims in the types it becomes a bit of a pain to remove them again.

Obviously I might also be missing something, this is just the case I have in mind and obviously I haven't looked at every usecase, just wanted to bring up this particular case.

@mtfishman
Copy link
Member Author

I see, thanks for the clarification. I agree the interplay between encoding ndims in the type vs. determining it from the axes is subtle.

The situation I'm thinking of is this one:

similar(BlockSparseArray{Float64,3,Array{Float64,3}}, blockedrange.(([2, 3], [2, 3], [2, 3])))

where I want to derive the block sparse type from an interface, so for example from BlockSparseArrayInterface{3}(DefaultArrayInterface{3}()). As you say, it becomes subtle when there is a mismatch between the type and the axes, deciding which parts of the type to use or ignore, etc. The specification that ndims==3 can be taken from the axes input to similar, but I think it is nice in practice to have it in the interface. Relatedly, dealing with underspecified types as part of wrapper types like that gets to be pretty subtle, especially when there are type constraints between the wrapper types, so having things fully specified is convenient.

So, stepping back, I agree with your point that there may be other solutions to the narrower problem I'm looking at, but also I think having ndims in the interface is a reasonable design more broadly, both because it matches the design of broadcast style in Base and also for the reason outlined in #7, which I think is important because right now there are interface functions that only make sense for matrices (say linear algebra functions) but we can't constrain interface functions to be AbstractMatrixInterface.

@mtfishman
Copy link
Member Author

mtfishman commented Jun 10, 2025

I was thinking about this a bit more, and I think the point you are bringing up is a good one. Basically, it is hard to tell when you really want to use the ndims in the interface and when you want to ignore it. Broadcasting is pretty special because you can determine the ndims of the broadcasting expression based on the maximum of the ndims of the arguments of the expression, that's not the case here since the interface objects are supposed to work across a variety of functions.

This PR approaches that with the following rules:

  1. If all arguments have the same ndims, that is used as the ndims of the AbstractArrayInterface object.
  2. If arguments disagree with each other, ndims is set to Any, which is the convention for saying that it is not defined.
  3. As needed, functions that construct an AbstractArrayInterface object can determine and set the ndims on the fly based on the arguments, for example how the Concatenated object constructors are defined in this PR. That is a way to supersede 1. and 2. when those rules aren't sufficient. That is why AbstractArrayInterface subtypes are expected to define constructors like ArrayInterface{M}(::Val{N}) -> ArrayInterface{N}() in order to set the ndims, which is an interface requirement of Broadcast.AbstractArrayStyle subtypes as well.

I think those rules should cover a lot of cases automatically, and provides a reasonable way to customize as needed.

I'm thinking more about the case I brought up above. To summarize, we want an interface object:

BlockSparseArrayInterface{3}(DefaultArrayInterface{3}())

and a set of axes blockedrange.(([2, 3], [2, 3], [2, 3])) to map to a call:

similar(BlockSparseArray{Float64,3,Array{Float64,3}}, blockedrange.(([2, 3], [2, 3], [2, 3])))

which forwards to the constructor:

BlockSparseArray{Float64,3,Array{Float64,3}}(undef, blockedrange.(([2, 3], [2, 3], [2, 3])))

Thinking through how this works if the ndims aren't specified, i.e. we instead have an interface object BlockSparseArrayInterface(DefaultArrayInterface()), that could map to a call:

similar(BlockSparseArray{Float64,<:Any,Array{Float64}}, blockedrange.(([2, 3], [2, 3], [2, 3])))

which means we should define the constructor:

BlockSparseArray{Float64,<:Any,Array{Float64}}(undef, blockedrange.(([2, 3], [2, 3], [2, 3])))

or more generally the constructor where ndims isn't specified and the block type is partially specified. That constructor kind of works right now but creates a slightly ill-defined BlockSparseArray, but probably there is a way to make it work by using the axes to specify more of the block type. As you brought up, even if the ndims is specified, depending on the block type there still might be type parameters that can only be determined from the axes (i.e. if the block type has more structure, such as a block structure or Kronecker structure). So, I think it would be good to define that constructor where the ndims isn't specified and the block type is only partially specified (I think that can be done with some use of Base.promote_op and similar on the block type and axes). Even so, I think this PR is helpful for other reasons mentioned in earlier posts, even if there is an alternative solution to the original problem I was trying to solve.

Copy link
Contributor

@lkdvos lkdvos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me + our discussion on arraytype etc, should be good to go after!

@mtfishman
Copy link
Member Author

@lkdvos in the latest I removed arraytype in favor of overloading similar(interface::AbstractArrayInterface, T::Type, ax) and also addressed your comments. Let me know if you have other comments, otherwise I'll merge soon.

end
function Concatenated{Interface}(dims::Val, args::Tuple) where {Interface}
return Concatenated(Interface(), dims, args)
N = cat_ndims(dims, args...)
return _Concatenated(set_interface_ndims(Interface, Val(N)), dims, args)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this new call to similar, do you need to set the interface ndims already at this point? I guess this would be automatically determined from the ax later down the line, so I'm just wondering if there is any difference.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, when you call similar, the ndims in the interface will get overridden by the axes input to similar. I kept it in case there is some other use case where storing the ndims in the interface is useful, but I admit I don't have a particular use case in mind for Concatenated.

I suppose what this does is catch cases where a user specifies an interface but it actually has the wrong ndims for the concatenation expression, I guess that could be checked.

But also thinking about this particular constructor, it should change the Interface type since that mean this constructor doesn't satisfy T(args...) isa T, so I'll change this one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the latest commits I propose a compromise. If the interface object is passed explicitly, it is taken "as is" and isn't modified, even if the ndims aren't specified or are incorrect, but when it is constructed from the arguments the correct ndims are computed explicitly.

That matches the behavior of Broadcasted:

julia> using Base.Broadcast: DefaultArrayStyle, Broadcasted

julia> bc = Broadcasted(DefaultArrayStyle{1}(), +, (randn(2, 2), randn(2, 2)))
Broadcasted(+, ([0.095430782012298 0.4338409876936397; -0.2285907590132556 2.0739880112106475], [-0.6626604337472756 0.5369654075753642; 0.2681815713554777 -0.7930248596990674]))

julia> bc.style
DefaultArrayStyle{1}()

julia> bc = Broadcasted(+, (randn(2, 2), randn(2, 2)))
Broadcasted(+, ([-0.08113666651442918 -0.11158415095613558; -0.5031937898445847 -1.1018574952687241], [-1.595659893181313 0.12746978522353117; 0.026558187695457296 -0.22363363427492086]))

julia> bc.style
DefaultArrayStyle{2}()

That seems reasonable, since the interface is meant to be something where you can specify it to decide on the dispatch, I think we shouldn't be too opinionated about modifying what the user input since it may have been deliberate.

@mtfishman mtfishman merged commit 4b9fed0 into main Jun 10, 2025
19 checks passed
@mtfishman mtfishman deleted the mf/arrayinterface_ndims branch June 10, 2025 19:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ENHANCEMENT] Add ndims type parameter to AbstractArrayInterface, define aliases like AbstractMatrixInterface
2 participants