Skip to content

The eltype of SparseMatrixCSC's colptr should be Int instead of Ti #38790

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
msekino opened this issue Dec 9, 2020 · 9 comments
Closed

The eltype of SparseMatrixCSC's colptr should be Int instead of Ti #38790

msekino opened this issue Dec 9, 2020 · 9 comments
Labels
sparse Sparse arrays

Comments

@msekino
Copy link

msekino commented Dec 9, 2020

colptr::Vector{Ti} # Column i is in colptr[i]:(colptr[i+1]-1)

In the definition of the SparseMatrixCSC, the eltype Ti of colptr is the same as that of rowval.
Therefore, we have to set Ti with a capacity that can store the number of elements (colptr), instead of a capacity that can store only the row number (rowval).

Since the length of rowval is up to m * n and the length of colptr is only n + 1, rowval requires much more memory.
The value to be stored in rowval is only up to m, so we want to use the smallest eltype that can store m (e.g.: UInt32).

If m <= typemax(UInt32), the value to be stored in colptr is up to m * n and it is quite possible that the value will be greater than typemax(UInt32).

From the above, I think that the eltype of SparseMatrixCSC's colptr should be Int instead of Ti.

Best regards.

@jebej
Copy link
Contributor

jebej commented Dec 9, 2020

This analysis makes sense when considering the limit where the sparse matrix is close to dense.

I think instead that the choice of having the same type for rowval and colptr was made assuming that they both would be storing the same order of magnitudes of number of values and therefore that the values themselves would also have the same order of magnitude. (That is a number of stored values ~linear in n)

I also think that adding a type parameter to the SparseArray type is unlikely to be done at this point.

@JeffBezanson JeffBezanson added the sparse Sparse arrays label Dec 9, 2020
@msekino
Copy link
Author

msekino commented Dec 10, 2020

Thank you for the comment!

In our case, m ≈ 50,000,000, n ≈ 100,000 and each row stores only about 100 Boolean values.
The density is only 0.1% and m ≪ typemax(UInt32) = 4,294,967,295, however the nnz = 5,000,000,000 > typemax(UInt32).
Thus, even if it's not so dense, the nnz can easily exceed the capacity of UInt32.

I also think that adding a type parameter to the SparseArray type is unlikely to be done at this point.

I think so too.

@jebej
Copy link
Contributor

jebej commented Dec 10, 2020

Ah yes, you are right at the transition....

You could always roll your own type and subtype SparseArrays.AbstractSparseMatrixCSC, though I'm sure a bunch of functions have their signature typed to SparseMatrixCSC instead of AbstractSparseMatrixCSC and would have to be changed to work seamlessly.

I think that would be the best way for you to solve this problem neatly.

If you do end up trying this, let us know which functions don't work.

@dkarrasch
Copy link
Member

I think it's unlikely to be done if nobody does it. 😝 If you append a type parameter to the existing ones, then this should be non-breaking. Methods with signatures that parametrize the first two types (usually Tv and Ti) would simply leave the new one implicit. If you then set the third type parameter to Ti by default, then by default nothing changes, but you gain extra flexibility. Of course, one would need to go through the SparseArrays code and check where that type matters. I think the challenge is that it is a tedious job and difficult to not miss anything. Ok, and one needs to check upfront if major "consuming" packages like SuiteSparse would accept a discrepancy in those types. If not, there is probably not that much to be gained from such an endeavor.

@msekino
Copy link
Author

msekino commented Dec 11, 2020

Hmm, It's tough to make a new type that can replace SparseMatrixCSC...

I thought that directly applying Int to colptr in the SparseMatrixCSC is the simple solution.
However, now I'm afraid this change will result in a lot of code (public as well as private) that will need to be modified...

We will fork the repository, modify it, and try to use it for just our purpose.

Thank you.

@ViralBShah
Copy link
Member

I just reopened JuliaLang/LinearAlgebra.jl#446 in order to start a discussion for moving these implementations into their own packages, precisely to pave the way for having more first class sparse matrix representations, and more flexibility in the use of sparse direct solvers.

@ViralBShah
Copy link
Member

This use case is very interesting and fun to read about!

@msekino
Copy link
Author

msekino commented Jan 21, 2021

@ViralBShah
I'm very sorry for tooo late response, and thank you so much for the attention!
I sincerely hope that this issue will be resolved...

@msekino
Copy link
Author

msekino commented Mar 23, 2021

I think the following suggestion is a simple and good solution and will work for us without much effort.
Thanks!
#39899 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sparse Sparse arrays
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants