The eltype of SparseMatrixCSC's colptr should be Int instead of Ti #38790

msekino · 2020-12-09T14:26:01Z

julia/stdlib/SparseArrays/src/sparsematrix.jl

Line 22 in a813a6e

colptr::Vector{Ti} # Column i is in colptr[i]:(colptr[i+1]-1)

In the definition of the SparseMatrixCSC, the eltype Ti of colptr is the same as that of rowval.
Therefore, we have to set Ti with a capacity that can store the number of elements (colptr), instead of a capacity that can store only the row number (rowval).

Since the length of rowval is up to m * n and the length of colptr is only n + 1, rowval requires much more memory.
The value to be stored in rowval is only up to m, so we want to use the smallest eltype that can store m (e.g.: UInt32).

If m <= typemax(UInt32), the value to be stored in colptr is up to m * n and it is quite possible that the value will be greater than typemax(UInt32).

From the above, I think that the eltype of SparseMatrixCSC's colptr should be Int instead of Ti.

Best regards.

The text was updated successfully, but these errors were encountered:

jebej · 2020-12-09T15:11:57Z

This analysis makes sense when considering the limit where the sparse matrix is close to dense.

I think instead that the choice of having the same type for rowval and colptr was made assuming that they both would be storing the same order of magnitudes of number of values and therefore that the values themselves would also have the same order of magnitude. (That is a number of stored values ~linear in n)

I also think that adding a type parameter to the SparseArray type is unlikely to be done at this point.

msekino · 2020-12-10T01:31:08Z

Thank you for the comment!

In our case, m ≈ 50,000,000, n ≈ 100,000 and each row stores only about 100 Boolean values.
The density is only 0.1% and m ≪ typemax(UInt32) = 4,294,967,295, however the nnz = 5,000,000,000 > typemax(UInt32).
Thus, even if it's not so dense, the nnz can easily exceed the capacity of UInt32.

I also think that adding a type parameter to the SparseArray type is unlikely to be done at this point.

I think so too.

jebej · 2020-12-10T14:10:45Z

Ah yes, you are right at the transition....

You could always roll your own type and subtype SparseArrays.AbstractSparseMatrixCSC, though I'm sure a bunch of functions have their signature typed to SparseMatrixCSC instead of AbstractSparseMatrixCSC and would have to be changed to work seamlessly.

I think that would be the best way for you to solve this problem neatly.

If you do end up trying this, let us know which functions don't work.

dkarrasch · 2020-12-10T19:53:55Z

I think it's unlikely to be done if nobody does it. 😝 If you append a type parameter to the existing ones, then this should be non-breaking. Methods with signatures that parametrize the first two types (usually Tv and Ti) would simply leave the new one implicit. If you then set the third type parameter to Ti by default, then by default nothing changes, but you gain extra flexibility. Of course, one would need to go through the SparseArrays code and check where that type matters. I think the challenge is that it is a tedious job and difficult to not miss anything. Ok, and one needs to check upfront if major "consuming" packages like SuiteSparse would accept a discrepancy in those types. If not, there is probably not that much to be gained from such an endeavor.

msekino · 2020-12-11T01:14:01Z

Hmm, It's tough to make a new type that can replace SparseMatrixCSC...

I thought that directly applying Int to colptr in the SparseMatrixCSC is the simple solution.
However, now I'm afraid this change will result in a lot of code (public as well as private) that will need to be modified...

We will fork the repository, modify it, and try to use it for just our purpose.

Thank you.

ViralBShah · 2020-12-17T15:36:51Z

I just reopened JuliaLang/LinearAlgebra.jl#446 in order to start a discussion for moving these implementations into their own packages, precisely to pave the way for having more first class sparse matrix representations, and more flexibility in the use of sparse direct solvers.

ViralBShah · 2020-12-17T15:37:55Z

This use case is very interesting and fun to read about!

msekino · 2021-01-21T07:31:08Z

@ViralBShah
I'm very sorry for tooo late response, and thank you so much for the attention!
I sincerely hope that this issue will be resolved...

msekino · 2021-03-23T00:58:29Z

I think the following suggestion is a simple and good solution and will work for us without much effort.
Thanks!
#39899 (comment)

JeffBezanson added the sparse Sparse arrays label Dec 9, 2020

dkarrasch mentioned this issue Mar 3, 2021

RFC: Add rowval index type to SparseMatrixCSC #39899

Closed

3 tasks

msekino closed this as completed Mar 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The eltype of SparseMatrixCSC's colptr should be Int instead of Ti #38790

The eltype of SparseMatrixCSC's colptr should be Int instead of Ti #38790

msekino commented Dec 9, 2020 •

edited

Loading

jebej commented Dec 9, 2020

msekino commented Dec 10, 2020 •

edited

Loading

jebej commented Dec 10, 2020

dkarrasch commented Dec 10, 2020

msekino commented Dec 11, 2020

ViralBShah commented Dec 17, 2020

ViralBShah commented Dec 17, 2020

msekino commented Jan 21, 2021

msekino commented Mar 23, 2021

The eltype of SparseMatrixCSC's colptr should be Int instead of Ti #38790

The eltype of SparseMatrixCSC's colptr should be Int instead of Ti #38790

Comments

msekino commented Dec 9, 2020 • edited Loading

jebej commented Dec 9, 2020

msekino commented Dec 10, 2020 • edited Loading

jebej commented Dec 10, 2020

dkarrasch commented Dec 10, 2020

msekino commented Dec 11, 2020

ViralBShah commented Dec 17, 2020

ViralBShah commented Dec 17, 2020

msekino commented Jan 21, 2021

msekino commented Mar 23, 2021

msekino commented Dec 9, 2020 •

edited

Loading

msekino commented Dec 10, 2020 •

edited

Loading