-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update GPUToolbox to v0.2 #2687
base: master
Are you sure you want to change the base?
Conversation
@@ -4,7 +4,7 @@ using GPUCompiler | |||
|
|||
using GPUArrays | |||
|
|||
using GPUToolbox: SimpleVersion, @sv_str | |||
using GPUToolbox: SimpleVersion, @sv_str, i32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not simply import everything? We control that package, after all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUDA.jl Benchmarks
Benchmark suite | Current: 9161a0e | Previous: c75b56f | Ratio |
---|---|---|---|
latency/precompile |
46241783821 ns |
46181620127.5 ns |
1.00 |
latency/ttfp |
7104757954 ns |
7068629094 ns |
1.01 |
latency/import |
3740569375 ns |
3724762175 ns |
1.00 |
integration/volumerhs |
9624153.5 ns |
9624216.5 ns |
1.00 |
integration/byval/slices=1 |
146935 ns |
146894 ns |
1.00 |
integration/byval/slices=3 |
425223 ns |
425137.5 ns |
1.00 |
integration/byval/reference |
144909 ns |
144952 ns |
1.00 |
integration/byval/slices=2 |
286077 ns |
285936 ns |
1.00 |
integration/cudadevrt |
103282 ns |
103412 ns |
1.00 |
kernel/indexing |
14211 ns |
14099 ns |
1.01 |
kernel/indexing_checked |
14757 ns |
14674 ns |
1.01 |
kernel/occupancy |
671.8037974683544 ns |
701.1379310344828 ns |
0.96 |
kernel/launch |
2059.7 ns |
2179.6666666666665 ns |
0.94 |
kernel/rand |
14590 ns |
14749 ns |
0.99 |
array/reverse/1d |
19644 ns |
19776 ns |
0.99 |
array/reverse/2d |
25320 ns |
24908 ns |
1.02 |
array/reverse/1d_inplace |
10356.5 ns |
10219 ns |
1.01 |
array/reverse/2d_inplace |
12076 ns |
11910 ns |
1.01 |
array/copy |
21226 ns |
21311 ns |
1.00 |
array/iteration/findall/int |
159303 ns |
158209 ns |
1.01 |
array/iteration/findall/bool |
139699 ns |
139123 ns |
1.00 |
array/iteration/findfirst/int |
154653 ns |
153168 ns |
1.01 |
array/iteration/findfirst/bool |
155261 ns |
154631 ns |
1.00 |
array/iteration/scalar |
72166 ns |
71886 ns |
1.00 |
array/iteration/logical |
216022.5 ns |
213254 ns |
1.01 |
array/iteration/findmin/1d |
41704 ns |
40786 ns |
1.02 |
array/iteration/findmin/2d |
94285 ns |
93428 ns |
1.01 |
array/reductions/reduce/1d |
40190.5 ns |
35669 ns |
1.13 |
array/reductions/reduce/2d |
51223.5 ns |
40477 ns |
1.27 |
array/reductions/mapreduce/1d |
38542 ns |
33443 ns |
1.15 |
array/reductions/mapreduce/2d |
51598 ns |
40694.5 ns |
1.27 |
array/broadcast |
20887 ns |
20825 ns |
1.00 |
array/copyto!/gpu_to_gpu |
11687 ns |
13806 ns |
0.85 |
array/copyto!/cpu_to_gpu |
210260 ns |
208873 ns |
1.01 |
array/copyto!/gpu_to_cpu |
245325 ns |
242948 ns |
1.01 |
array/accumulate/1d |
109209 ns |
108924 ns |
1.00 |
array/accumulate/2d |
79725.5 ns |
80034 ns |
1.00 |
array/construct |
1312.6 ns |
1297.3 ns |
1.01 |
array/random/randn/Float32 |
44061 ns |
43906.5 ns |
1.00 |
array/random/randn!/Float32 |
26846 ns |
26669 ns |
1.01 |
array/random/rand!/Int64 |
27252 ns |
27027 ns |
1.01 |
array/random/rand!/Float32 |
8717.333333333334 ns |
8863 ns |
0.98 |
array/random/rand/Int64 |
33643 ns |
30048.5 ns |
1.12 |
array/random/rand/Float32 |
13102 ns |
13342 ns |
0.98 |
array/permutedims/4d |
61149.5 ns |
60675.5 ns |
1.01 |
array/permutedims/2d |
55648.5 ns |
55115.5 ns |
1.01 |
array/permutedims/3d |
56588 ns |
55700 ns |
1.02 |
array/sorting/1d |
2766862 ns |
2777689 ns |
1.00 |
array/sorting/by |
3354947 ns |
3368739 ns |
1.00 |
array/sorting/2d |
1085516.5 ns |
1084912 ns |
1.00 |
cuda/synchronization/stream/auto |
1039.5 ns |
1013.5384615384615 ns |
1.03 |
cuda/synchronization/stream/nonblocking |
6487.8 ns |
6485.2 ns |
1.00 |
cuda/synchronization/stream/blocking |
840.1782178217821 ns |
826 ns |
1.02 |
cuda/synchronization/context/auto |
1161.2 ns |
1197.3 ns |
0.97 |
cuda/synchronization/context/nonblocking |
6706.8 ns |
6768.6 ns |
0.99 |
cuda/synchronization/context/blocking |
931.1578947368421 ns |
946.4193548387096 ns |
0.98 |
This comment was automatically generated by workflow using github-action-benchmark.
No description provided.