-
Notifications
You must be signed in to change notification settings - Fork 243
Fix spdiagm with specified pairs #2784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Your PR requires formatting changes to meet the project's style guidelines. Click here to view the suggested changes.diff --git a/lib/cusparse/array.jl b/lib/cusparse/array.jl
index 4c11e0b5d..3f4df8f8d 100644
--- a/lib/cusparse/array.jl
+++ b/lib/cusparse/array.jl
@@ -336,20 +336,20 @@ function SparseArrays.sparsevec(I::CuArray{Ti}, V::CuArray{Tv}, n::Integer) wher
CuSparseVector(I, V, n)
end
-SparseArrays.spdiagm(kv::Pair{<:Integer,<:CuVector}...) = _cuda_spdiagm(nothing, kv...)
-SparseArrays.spdiagm(m::Integer, n::Integer, kv::Pair{<:Integer,<:CuVector}...) = _cuda_spdiagm((Int(m),Int(n)), kv...)
+SparseArrays.spdiagm(kv::Pair{<:Integer, <:CuVector}...) = _cuda_spdiagm(nothing, kv...)
+SparseArrays.spdiagm(m::Integer, n::Integer, kv::Pair{<:Integer, <:CuVector}...) = _cuda_spdiagm((Int(m), Int(n)), kv...)
SparseArrays.spdiagm(v::CuVector) = _cuda_spdiagm(nothing, 0 => v)
SparseArrays.spdiagm(m::Integer, n::Integer, v::CuVector) = _cuda_spdiagm((Int(m), Int(n)), 0 => v)
function _cuda_spdiagm(size, kv::Pair{<:Integer, <:CuVector}...)
I, J, V, mmax, nmax = _cuda_spdiagm_internal(kv...)
mnmax = max(mmax, nmax)
- m, n = something(size, (mnmax,mnmax))
+ m, n = something(size, (mnmax, mnmax))
(m ≥ mmax && n ≥ nmax) || throw(DimensionMismatch("invalid size=$size"))
return sparse(CuVector(I), CuVector(J), V, m, n)
end
-function _cuda_spdiagm_internal(kv::Pair{T,<:CuVector}...) where {T<:Integer}
+function _cuda_spdiagm_internal(kv::Pair{T, <:CuVector}...) where {T <: Integer}
ncoeffs = 0
for p in kv
ncoeffs += SparseArrays._nnz(p.second)
@@ -374,7 +374,7 @@ function _cuda_spdiagm_internal(kv::Pair{T,<:CuVector}...) where {T<:Integer}
col = 0
end
numel = SparseArrays._nnz(v)
- r = 1+i:numel+i
+ r = (1 + i):(numel + i)
I_r, J_r = SparseArrays._indices(v, row, col)
copyto!(view(I, r), I_r)
copyto!(view(J, r), J_r) |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2784 +/- ##
==========================================
- Coverage 89.72% 89.70% -0.02%
==========================================
Files 153 153
Lines 13232 13274 +42
==========================================
+ Hits 11872 11908 +36
- Misses 1360 1366 +6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUDA.jl Benchmarks
Benchmark suite | Current: 2523b11 | Previous: a4a7af4 | Ratio |
---|---|---|---|
latency/precompile |
43078512320.5 ns |
43227936166.5 ns |
1.00 |
latency/ttfp |
7235008141 ns |
7121275224 ns |
1.02 |
latency/import |
3450098826 ns |
3420695643 ns |
1.01 |
integration/volumerhs |
9609573 ns |
9610749.5 ns |
1.00 |
integration/byval/slices=1 |
147170 ns |
146856 ns |
1.00 |
integration/byval/slices=3 |
425740 ns |
425444 ns |
1.00 |
integration/byval/reference |
145312 ns |
145104 ns |
1.00 |
integration/byval/slices=2 |
286277.5 ns |
286260 ns |
1.00 |
integration/cudadevrt |
103616 ns |
103521 ns |
1.00 |
kernel/indexing |
14397.5 ns |
14402 ns |
1.00 |
kernel/indexing_checked |
15277 ns |
14989.5 ns |
1.02 |
kernel/occupancy |
755.1029411764706 ns |
722.1785714285714 ns |
1.05 |
kernel/launch |
2430.6666666666665 ns |
2276.3333333333335 ns |
1.07 |
kernel/rand |
17888 ns |
17130 ns |
1.04 |
array/reverse/1d |
20188 ns |
20044.5 ns |
1.01 |
array/reverse/2d |
25118.5 ns |
24062 ns |
1.04 |
array/reverse/1d_inplace |
11233 ns |
10376 ns |
1.08 |
array/reverse/2d_inplace |
13428 ns |
11904 ns |
1.13 |
array/copy |
21288 ns |
21530 ns |
0.99 |
array/iteration/findall/int |
160244 ns |
159043 ns |
1.01 |
array/iteration/findall/bool |
140284 ns |
139211 ns |
1.01 |
array/iteration/findfirst/int |
164195 ns |
160978 ns |
1.02 |
array/iteration/findfirst/bool |
165257.5 ns |
162266 ns |
1.02 |
array/iteration/scalar |
73406 ns |
71832 ns |
1.02 |
array/iteration/logical |
219938 ns |
217950.5 ns |
1.01 |
array/iteration/findmin/1d |
46937.5 ns |
47357.5 ns |
0.99 |
array/iteration/findmin/2d |
97423 ns |
97468 ns |
1.00 |
array/reductions/reduce/1d |
36616 ns |
36864 ns |
0.99 |
array/reductions/reduce/2d |
42192.5 ns |
41394 ns |
1.02 |
array/reductions/mapreduce/1d |
34847 ns |
34541 ns |
1.01 |
array/reductions/mapreduce/2d |
41664.5 ns |
41294 ns |
1.01 |
array/broadcast |
21469 ns |
21139 ns |
1.02 |
array/copyto!/gpu_to_gpu |
12489 ns |
12850.5 ns |
0.97 |
array/copyto!/cpu_to_gpu |
217160 ns |
217756 ns |
1.00 |
array/copyto!/gpu_to_cpu |
284338 ns |
285934 ns |
0.99 |
array/accumulate/1d |
110000 ns |
108976.5 ns |
1.01 |
array/accumulate/2d |
81141 ns |
80598 ns |
1.01 |
array/construct |
1262.5 ns |
1307.6 ns |
0.97 |
array/random/randn/Float32 |
44785.5 ns |
47928 ns |
0.93 |
array/random/randn!/Float32 |
25168 ns |
25275 ns |
1.00 |
array/random/rand!/Int64 |
27407 ns |
27404 ns |
1.00 |
array/random/rand!/Float32 |
8859 ns |
8877.666666666666 ns |
1.00 |
array/random/rand/Int64 |
30532 ns |
30410 ns |
1.00 |
array/random/rand/Float32 |
13334 ns |
13403 ns |
0.99 |
array/permutedims/4d |
61801 ns |
61384.5 ns |
1.01 |
array/permutedims/2d |
56099.5 ns |
55567 ns |
1.01 |
array/permutedims/3d |
56622 ns |
56298 ns |
1.01 |
array/sorting/1d |
2776621.5 ns |
2778057 ns |
1.00 |
array/sorting/by |
3367591.5 ns |
3369635 ns |
1.00 |
array/sorting/2d |
1087061.5 ns |
1086371 ns |
1.00 |
cuda/synchronization/stream/auto |
1062.1 ns |
1016.0714285714286 ns |
1.05 |
cuda/synchronization/stream/nonblocking |
7424.4 ns |
7897.7 ns |
0.94 |
cuda/synchronization/stream/blocking |
851.4489795918367 ns |
805.7272727272727 ns |
1.06 |
cuda/synchronization/context/auto |
1152.2 ns |
1176.3 ns |
0.98 |
cuda/synchronization/context/nonblocking |
7721.1 ns |
7510.1 ns |
1.03 |
cuda/synchronization/context/blocking |
914.1041666666666 ns |
915.85 ns |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except the file .DS_Store
that should be removed, everything looks correct!
Thanks for the PR.
Fix: #2783
Provide the more general way to define a sparse diagonal matrix with CUDA.