Skip to content

Rationalize and try to fix failing ldiv tests #2809

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

kshyatt
Copy link
Member

@kshyatt kshyatt commented Jul 2, 2025

Trying to fix intermittently failing CI. Doesn't make sense to have these checks for only one of the inplace/not-inplace versions. Hopefully this helps stability.

@kshyatt kshyatt requested a review from maleadt July 2, 2025 17:17
@kshyatt kshyatt added cuda libraries Stuff about CUDA library wrappers. tests Adds or changes tests. labels Jul 2, 2025
Copy link
Contributor

github-actions bot commented Jul 2, 2025

Your PR requires formatting changes to meet the project's style guidelines.
Please consider running Runic (git runic master) to apply these changes.

Click here to view the suggested changes.
diff --git a/test/libraries/cusparse/interfaces.jl b/test/libraries/cusparse/interfaces.jl
index fa25d8330..34f9d75f8 100644
--- a/test/libraries/cusparse/interfaces.jl
+++ b/test/libraries/cusparse/interfaces.jl
@@ -258,7 +258,7 @@ nB = 2
                                 end
                             end
                             @testset "\\ -- CuMatrix" begin
-                                C  = triangle(opa(A)) \ opb(B)
+                                C = triangle(opa(A)) \ opb(B)
                                 dC = triangle(opa(dA)) \ opb(dB)
                                 @test C ≈ collect(dC)
                                 if CUSPARSE.version() < v"12.0"

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Benchmark suite Current: 2b58bc1 Previous: 4f38802 Ratio
latency/precompile 42937214294 ns 42824416725 ns 1.00
latency/ttfp 7006899212 ns 7051266950 ns 0.99
latency/import 3598749321 ns 3574411987 ns 1.01
integration/volumerhs 9609056 ns 9608389 ns 1.00
integration/byval/slices=1 146856 ns 146872 ns 1.00
integration/byval/slices=3 426026.5 ns 425794 ns 1.00
integration/byval/reference 145080 ns 144942 ns 1.00
integration/byval/slices=2 286326 ns 286144 ns 1.00
integration/cudadevrt 103529 ns 103388 ns 1.00
kernel/indexing 14148 ns 14276 ns 0.99
kernel/indexing_checked 15036 ns 15083 ns 1.00
kernel/occupancy 673.5796178343949 ns 677.6114649681529 ns 0.99
kernel/launch 2258.777777777778 ns 2157.8888888888887 ns 1.05
kernel/rand 14983 ns 14900 ns 1.01
array/reverse/1d 19640 ns 20028 ns 0.98
array/reverse/2d 23399 ns 25007 ns 0.94
array/reverse/1d_inplace 10382 ns 10952 ns 0.95
array/reverse/2d_inplace 12031 ns 12545 ns 0.96
array/copy 20933 ns 21084 ns 0.99
array/iteration/findall/int 157263 ns 158043.5 ns 1.00
array/iteration/findall/bool 139616 ns 140007 ns 1.00
array/iteration/findfirst/int 167118 ns 164557.5 ns 1.02
array/iteration/findfirst/bool 170423 ns 167385 ns 1.02
array/iteration/scalar 71422 ns 74295 ns 0.96
array/iteration/logical 213235 ns 215875.5 ns 0.99
array/iteration/findmin/1d 45880 ns 47331 ns 0.97
array/iteration/findmin/2d 96439 ns 97017 ns 0.99
array/reductions/reduce/Int64/1d 42507 ns 43072.5 ns 0.99
array/reductions/reduce/Int64/dims=1 45746.5 ns 55698.5 ns 0.82
array/reductions/reduce/Int64/dims=2 62141 ns 62572.5 ns 0.99
array/reductions/reduce/Int64/dims=1L 88721 ns 89129 ns 1.00
array/reductions/reduce/Int64/dims=2L 86683 ns 88184.5 ns 0.98
array/reductions/reduce/Float32/1d 34086 ns 35313 ns 0.97
array/reductions/reduce/Float32/dims=1 43953 ns 51818 ns 0.85
array/reductions/reduce/Float32/dims=2 59614 ns 59835 ns 1.00
array/reductions/reduce/Float32/dims=1L 52197.5 ns 52336 ns 1.00
array/reductions/reduce/Float32/dims=2L 69873 ns 70233.5 ns 0.99
array/reductions/mapreduce/Int64/1d 42098 ns 44093 ns 0.95
array/reductions/mapreduce/Int64/dims=1 46766 ns 47633.5 ns 0.98
array/reductions/mapreduce/Int64/dims=2 61726 ns 62709 ns 0.98
array/reductions/mapreduce/Int64/dims=1L 88921 ns 89036 ns 1.00
array/reductions/mapreduce/Int64/dims=2L 86327 ns 87347.5 ns 0.99
array/reductions/mapreduce/Float32/1d 34357.5 ns 34780.5 ns 0.99
array/reductions/mapreduce/Float32/dims=1 41477 ns 41996.5 ns 0.99
array/reductions/mapreduce/Float32/dims=2 59999 ns 60450.5 ns 0.99
array/reductions/mapreduce/Float32/dims=1L 52679 ns 52739 ns 1.00
array/reductions/mapreduce/Float32/dims=2L 70249.5 ns 70715 ns 0.99
array/broadcast 20011 ns 20360 ns 0.98
array/copyto!/gpu_to_gpu 12766 ns 12890 ns 0.99
array/copyto!/cpu_to_gpu 213580.5 ns 217680 ns 0.98
array/copyto!/gpu_to_cpu 283317 ns 286671 ns 0.99
array/accumulate/Int64/1d 124768.5 ns 125190 ns 1.00
array/accumulate/Int64/dims=1 83243 ns 84136 ns 0.99
array/accumulate/Int64/dims=2 157572 ns 158690 ns 0.99
array/accumulate/Int64/dims=1L 1708962 ns 1709534 ns 1.00
array/accumulate/Int64/dims=2L 965903 ns 967437 ns 1.00
array/accumulate/Float32/1d 109074 ns 109803 ns 0.99
array/accumulate/Float32/dims=1 80983 ns 81170 ns 1.00
array/accumulate/Float32/dims=2 147799 ns 147834 ns 1.00
array/accumulate/Float32/dims=1L 1619103.5 ns 1619112.5 ns 1.00
array/accumulate/Float32/dims=2L 698281 ns 698583 ns 1.00
array/construct 1292.3 ns 1275.8 ns 1.01
array/random/randn/Float32 43716.5 ns 44761 ns 0.98
array/random/randn!/Float32 25068 ns 25104 ns 1.00
array/random/rand!/Int64 27432 ns 27468 ns 1.00
array/random/rand!/Float32 8771 ns 8662 ns 1.01
array/random/rand/Int64 38026 ns 30080 ns 1.26
array/random/rand/Float32 12913 ns 13152 ns 0.98
array/permutedims/4d 60201 ns 60473 ns 1.00
array/permutedims/2d 53933 ns 54524 ns 0.99
array/permutedims/3d 54865.5 ns 55468 ns 0.99
array/sorting/1d 2756471 ns 2763710 ns 1.00
array/sorting/by 3366837 ns 3356377 ns 1.00
array/sorting/2d 1087589 ns 1085339 ns 1.00
cuda/synchronization/stream/auto 1032.8 ns 1018.0909090909091 ns 1.01
cuda/synchronization/stream/nonblocking 7152.2 ns 7602.700000000001 ns 0.94
cuda/synchronization/stream/blocking 810.1630434782609 ns 806.236559139785 ns 1.00
cuda/synchronization/context/auto 1178.4 ns 1183.8 ns 1.00
cuda/synchronization/context/nonblocking 7946.4 ns 7801 ns 1.02
cuda/synchronization/context/blocking 918.8809523809524 ns 897.2923076923076 ns 1.02

This comment was automatically generated by workflow using github-action-benchmark.

@kshyatt kshyatt force-pushed the ksh/interfaces_fix branch from 5db6744 to ef0395c Compare July 23, 2025 19:02
@kshyatt kshyatt force-pushed the ksh/interfaces_fix branch from ef0395c to 2b58bc1 Compare July 24, 2025 12:41
@kshyatt
Copy link
Member Author

kshyatt commented Jul 24, 2025

@maleadt failing tests are the CUSTATEVEC ones, can we merge?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda libraries Stuff about CUDA library wrappers. tests Adds or changes tests.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants