Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

size field: allow Integer values of singleton type as elements #107

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

nsajko
Copy link
Collaborator

@nsajko nsajko commented Feb 12, 2025

Currently only Int values may be contained in the size field. This PR additionally allows values of any singleton type that subtypes Integer.

Fixes #54

Currently only `Int` values may be contained in the `size` field. This
PR additionally allows values of any singleton type that subtypes
`Integer`.

Fixes JuliaArrays#54
Copy link

codecov bot commented Feb 12, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.12%. Comparing base (c4d238f) to head (77ee0ca).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #107      +/-   ##
==========================================
- Coverage   96.74%   96.12%   -0.63%     
==========================================
  Files           3        3              
  Lines         246      284      +38     
==========================================
+ Hits          238      273      +35     
- Misses          8       11       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nsajko
Copy link
Collaborator Author

nsajko commented Feb 12, 2025

@PatrickHaecker you were interested in this functionality, would you like to try this out, or provide representative benchmarks?

I didn't benchmark anything yet, but this should speed up indexing when some of the dimensions are static as-is. There's probably more things that could be optimized on top of the current state of the PR.

@nsajko
Copy link
Collaborator Author

nsajko commented Feb 12, 2025

Here's a usage example using my package TypeDomainNaturalNumbers. There are other packages for static integers, too.

julia> using TypeDomainNaturalNumbers, FixedSizeArrays

julia> siz = (TypeDomainInteger(3), TypeDomainInteger(5))
(3, 5)

julia> Base.issingletontype(typeof(siz))
true

julia> siz === size(FixedSizeArray{Float32}(undef, siz))
true

julia> siz === size(reshape(FixedSizeArray{Float32}(undef, 15), siz))
true

@nsajko

This comment was marked as outdated.

x
end

struct FixedSizeArray{T,N,Mem<:DenseVector{T},Size<:NTuple{N,Integer}} <: DenseArray{T,N}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this turning FixedSizeArray too close to StaticArray by having the shape in the type domain?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some similarities with both HybridArrays.jl (allows specifying an arbitrary mix of statically and dynamically sized axes) and SizedArray from StaticArrays.jl (allows specifying statically sized axes for a wrapped dynamically-sized array). The main difference (both a benefit and limitation) over both HybridArrays.jl and StaticArrays.jl is that FixedSizeArray is dense (subypes DenseArray). And StaticArrays requires all axes to be statically sized, of course.

In case it's not clear, this PR doesn't remove any existing features, it just generalizes the current design.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't concerned about removing features but about putting too much information in the type domain. That's a recipe for skyrocketing compile times.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth I think the test suite takes about the same time to execute as before. So, as long as all axes are dynamically sized, I don't expect compile time regressions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure you'd see much of a difference in tests here where we mostly do different operations. The problem is more when you do same operations on arrays of different sizes: the generated code is very similar, but each of them triggers full recompilation since the type is different. Of course putting more information in the type domain is more flexible (you can even dispatch on the exact shape of the array which in certain cases can be very convenient) but that comes at a cost. The constant size field isn't as flexible, but it still enables some compile-time optimisations (e.g. when constant-propagation does its job).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is more when you do same operations on arrays of different sizes: the generated code is very similar, but each of them triggers full recompilation since the type is different.

That's definitely true, with this PR it's possible to create lots of distinct types, causing lots of work for the compiler. However in my mind if that situation happens it is almost surely what the user wants, no? The potential to unintentionally construct a FixedSizeArray with a statically sized axis seems small, given that static integers are rarely used.

Do I understand correctly that you're worried about this unintentional construction/type proliferation? If that's it, it should be possible to further diminish the possibility of unintentionally constructing an array with a statically sized axis: we could, for example, require such construction to specify the Size type parameter explicitly. That is, a constructor type that explicitly features a Size type parameter with singleton type constituents would be the only way to construct a FixedSizeArray with a non-Int-sized axis.

Does that seem like an improvement, and perhaps acceptable?

Copy link

@PatrickHaecker PatrickHaecker Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: Sorry, you can ignore this comment. I understood that Size can really be used to store the size information and I need to think this through.

Isn't this turning FixedSizeArray too close to StaticArray by having the shape in the type domain?

I have the feeling, that there is a misunderstanding, because I misread the code change in the beginning. Even after this change Size does not contain the size information, i.e. FixedSizeArray does not contain the shape in the type domain. This is in contrast to StaticArrays, although StaticArrays uses the same identifier Size for the shape information in its documentation.

What Size in this PR contains is the type of the size field for each dimension, e.g. only (a Tuple containing) Int8 for a short vector if the user specifies this. And while I now see the logic in the PR between the identifiers size and Size, I suggest to rename Size into SizeType or SizeT or something similar to avoid this confusion.

And while lifting size in the type domain could indeed trigger a lot of additional compilations (and would be best done in a new type in my opinion if anyone planned this), I think putting its type information into the type domain is reasonable. As default Int will be used as before, so there are no additional method compilations for existing users.

I expect only few different types which will be used in a typical program, i.e. a lot of combinations like {Int128, UInt8, UInt32} are unlikely to be ever seen in reality. But it's great that they are available if need arises without any cost when not used. As an NTuple is used, this type is not supported, so it would only be, e.g., {Int8, Int8, Int8}.

Copy link
Collaborator

@giordano giordano Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I just realised I did misunderstand the new type parameter: it isn't the size tuple itself, but its type:

julia> FixedSizeArray{Float64}(undef, 0, 0)
0×0 FixedSizeArray{Float64, 2, Memory{Float64}, Tuple{Int64, Int64}}

My concern was unwarranted (I read the code wrong and thought it'd move the tuple itself to the type domain), but I'd like @oscardssmith opinion on this. The type parameters are becoming a little unwieldy here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we probably want to keep the size as Int. the overhead should be low.

@nsajko nsajko marked this pull request as draft February 12, 2025 19:11
@nsajko nsajko marked this pull request as ready for review February 12, 2025 23:28
@PatrickHaecker
Copy link

Ok, this is definitely genius in my opinion. I guess the only question might be if it's too genius. ;-)

I originally had expected that there could be two types like FixedSizeArrays (much simplified):

  1. Type parameter for size type
struct FixedSizeVector1{T, SizeType<:Integer} # with `isprimitivetype(SizeType)`
    mem::Memory{T}
    size::SizeType
end

i.e. containing a size field with a type which is a concrete, primitive subtype of Integer. This has the advantage that its size is run-time defined and not parse-time defined and it's relatively compile-friendly with the disadvantage of needing some additional space per object for the size variable.

  1. Type parameter for size
struct FixedSizeVector2{T, Size} # with `Size isa Int`
    mem::Memory{T}
end

i.e. containing the size information in the type. This can be compile-heavy and needs to be defined at parse-time, but saves some space for the size variable.

The problem of FixedSizeVector1 is that due to the 8 byte memory reference (focusing here on amd64 architectures) and the alignment rules, even FixedSizeVector1{Int8, UInt8} |> sizeof is 16 bytes due to padding. So there is only a benefit for more than one dimension, e.g. FixedSizeArray1{Int8, Tuple{UInt8, UInt8}} |> sizeof is 16 and not 24.

However, as long as Memory uses 8 bytes both for :length and for :ptr and inlining of mutable structs is not possible the memory saving is indeed limited. With FixedSizeVector2 or the implementation of this PR, the memory saving can be a bit larger, but again the larger advantages would probably be only there if inlining would be supported and differently sized small arrays are needed.

Questions:

  • What is the advantage of this PR compared to FixedSizeVector2?
  • What is the reason that the size tuple fields must be either singleton type or Int but no other primitive subtype of Integer?
  • Wouldn't this type be better implemented as a separate type with both the existing and the new type subtyping something like AbstractFixedSizeArray?

@nsajko nsajko marked this pull request as draft February 16, 2025 18:50
@nsajko
Copy link
Collaborator Author

nsajko commented Feb 16, 2025

An issue is that Base.OneTo, used by axes, doesn't handle a length of singleton type well. The solution is to use something like GenericOneTo{Int} here instead of Base.OneTo:

const Supertype = supertype(Base.OneTo)  # in anticipation of `AbstractOneTo`
struct GenericOneTo{ElementType <: Integer, Length <: Integer} <: Supertype{ElementType}
    length::Length
end

@nsajko
Copy link
Collaborator Author

nsajko commented Feb 16, 2025

I'll answer all the comments in the future BTW.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

more options for size
4 participants