Releases · JuliaGPU/CUDA.jl

15 Oct 14:14

github-actions

v2.0.2

a8ac15c

v2.0.2

CUDA v2.0.2

Diff since v2.0.1

Closed issues:

cu() behavior for complex floating point numbers (#91)
Error when following example on using multiple GPUs on multiple processes (#468)
MacOS without nvidia GPU is trying to download CUDA111 on julia nightly (#469)
Drop BinaryProvider? (#474)
Latest version of master doesn't work on Windows (#477)
sum(CUDA.rand(3,3)) broken (#480)
copyto!() between cpu and gpu with subarrays (#491)

Merged pull requests:

Adapt to GPUCompiler changes. (#458) (@maleadt)
Fix initialization of global state (#471) (@maleadt)
Remove 'view' implementation. (#472) (@maleadt)
Workaround new artifact"" eagerness that prevents loading on unsupported platforms (#473) (@ianshmean)
Remove BinaryProvider dep. (#475) (@maleadt)
typo: libcuda.dll -> libcuda.so on Linux (#476) (@Alexander-Barth)
NFC array simplifications. (#481) (@maleadt)
Update manifest (#485) (@github-actions[bot])
Convert AbstractArray{ComplexF64} to CuArray{ComplexF32} by default (#489) (@pabloferz)

Assets 2

05 Oct 08:12

github-actions

v2.0.1

785c3b3

v2.0.1

CUDA v2.0.1

Diff since v2.0.0

Closed issues:

Can't update (#462)

Merged pull requests:

Remove duplicate comment (#464) (@blegat)
Add functionality to precompile the runtime library. (#465) (@maleadt)
Update manifest (#470) (@github-actions[bot])

Assets 2

02 Oct 07:12

github-actions

v2.0.0

70d93cc

v2.0.0

CUDA v2.0.0

Diff since v1.3.3

Closed issues:

Test failure during threading tests (#15)
Bad allocations in memory pool after device_reset! (#16)
CuArrays can lose Blas on reshaped views (#78)
allowscalar performance (#87)
Indexing with a CuArrays causes a 'scalar indexing disallowed' error from checkbounds (#90)
5-arg mul! for CUSPARSE (#98)
copyto!(Device, Host) uses scalar iteration in case of type mismatch (#105)
Array primitives broken for CUSPARSE arrays (#113)
SplittingPool: CPU allocations (#117)
error while concatenating to an empty CuArray (#139)
Showing sparse arrays goes wrong (#146)
Improve test coverage (#147)
CuArrays allocates a lot of memory on the default GPU (#153)
[Feature Request] Indexing CuArray with CuArray (#155)
Reshaping CuArray throws error during backpropagation (#162)
Match syntax and APIs against Julia 1.0 standard libraries (#163)
CURAND_STATUS_PREEXISTING_FAILURE when setting seed multiple times. (#212)
RFC: converts SparseMatrixCSC to CuSparseMatrixCSR via cu by default (#216)
Add a CuSparseMatrixCOO type (#220)
Test runner stumbles over path separators (#236)
Error: Invalid bitcode signature when loading CUDA.jl after precompilation (#293)
Atomic operations only work on global memory (#311)
Performance: cudnn algorithm selection (#318)
CUSPARSE is broken in CUDA.jl 1.2 (#322)
Device-side broadcast regression on 1.5 (#350)
API for fast math-like mode (#354)
CUDA 11.0 Update 1: cublasSetWorkspace (#365)
Can't precompile CUDA.jl on Kubuntu 20.04 (#396)
CuPtr should be Ptr in cudnnGetDropoutDescriptor (#397)
CUDA throws OOM error when initializing API on multiple devices (#398)
Cannot launch kernel with > 5 args using Dynamic Parallelism (#401)
Reverse performance regression (#410)
Tag for LLVM 3? (#412)
CUDA not working (#415)
StatsBase.transform fails on CuArray (#426)
Further unification of CUBLAS.axpy! and LinearAlgebra.BLAS.axpy! (#432)
size(range), length(range) and range[end] fail inside CUDA kernels (#434)
InitError: Cannot use memory pool 'binned' when CUDA.jl was precompiled for memory pool 'split'. (#446)
Missing dispatch for matrix multiplication with views? (#448)
New version not available yet? (#452)
using CUDA or CUArray, output: UndefVarError: AddrSpacePtr not defined (#457)
Unable to upgrade to the latest version (#459)

Merged pull requests:

Performance improvements by calling cuDNN API (#321) (@gartangh)
Use ccall wrapper for correct pointer type conversions (#392) (@maleadt)
Simplify Statistics.var and fix dims=tuple. (#393) (@maleadt)
Adapt to GPUArrays test change. (#394) (@maleadt)
Default to per-thread stream semantics (#395) (@maleadt)
Add a missing context argument for stateless codegen. (#399) (@maleadt)
Keep track of package latency timings. (#400) (@maleadt)
Update manifest (#402) (@github-actions[bot])
Latency improvements (#403) (@maleadt)
Fix bounds checking with GPU views. (#404) (@maleadt)
Force specialization for dynamic_cudacall to support more arguments. (#407) (@maleadt)
Fix some wrong pointer types in the CUDNN headers. (#408) (@maleadt)
Refactor CUSPARSE (#409) (@maleadt)
Fix typo (#411) (@yixingfu)
Update manifest (#413) (@github-actions[bot])
Simplify library wrappers by introducing a CUDA Ref (#414) (@maleadt)
Simplify and update wrappers (#416) (@maleadt)
GEMM improvements (#417) (@maleadt)
CompatHelper: add new compat entry for "BFloat16s" at version "0.1" (#418) (@github-actions[bot])
add CuSparseMatrixCOO (#421) (@marius311)
Update manifest (#423) (@github-actions[bot])
Global math mode for easy use of lower-precision functionality (#424) (@maleadt)
Improve init error message (#425) (@maleadt)
CUBLAS: wrap rot! to implement rotate! and reflect! (#427) (@maleadt)
CUFFT-related optimizations (#428) (@maleadt)
Fix reverse/view regression (#429) (@maleadt)
Update packages (#433) (@maleadt)
Introduce StridedCuArray (#435) (@maleadt)
Retry curandGenerateSeeds when OOM. (#436) (@maleadt)
Introduce DenseCuArray union (#437) (@maleadt)
Array simplifications (#438) (@maleadt)
Fix and test reverse on wrapped array. (#439) (@maleadt)
Fixes after recent array wrapper changes (#441) (@maleadt)
Adapt to GPUArrays changes. (#442) (@maleadt)
Provide CUBLAS with a pool-backed workspace. (#443) (@maleadt)
Fix finalization of copied arrays. (#444) (@maleadt)
Support for/Add CUDA 11.1 (#445) (@maleadt)
Update manifest (#449) (@github-actions[bot])
Allow use of strided vectors with mul! (gemv! and gemm!) (#450) (@maleadt)
Have convert call CuSparseArray's constructors. (#451) (@maleadt)

Assets 2

25 Aug 11:08

github-actions

v1.3.3

be21077

v1.3.3

CUDA v1.3.3

Diff since v1.3.2

Closed issues:

Type changing Array conversions give error when allowscalar(false) (#344)
getindex(::CuArray, ::Adjoint, ::Colon) fails (#345)
View with array indices causes memory copy before broadcast (#384)
Regression with Julia 1.5 (#390)

Merged pull requests:

Replace DevicePtr with Core.LLVMPtr. (#199) (@maleadt)
Make sure view indices reside on the GPU too. (#388) (@maleadt)
CompatHelper: Update DataStructures to v0.18 (#389) (@ChrisRackauckas)

Assets 2

24 Aug 07:09

github-actions

v1.3.2

56c5157

v1.3.2

CUDA v1.3.2

Diff since v1.3.1

Closed issues:

LLVM WMMA errors (#380)

Merged pull requests:

Fix handling of tests to skip. (#386) (@maleadt)
Update manifest (#387) (@github-actions[bot])

Assets 2

22 Aug 07:11

github-actions

v1.3.1

6ed96e6

v1.3.1

CUDA v1.3.1

Diff since v1.3.0

Closed issues:

Element-wise conversion fails (#378)
atomic_min fails for Int32 in global CuDeviceArrays (#379)
Segmentation fault from @cuprint on char (#381)
error in versioninfo(), name not defined (#385)

Merged pull requests:

Fix docs (#330) (@maleadt)
Wrap cusparseSpMV (#351) (@marius311)
specify Cchar rather than char in the doc for @cuprint (#382) (@MasonProtter)
Adapt to LLVM.jl changes for stateless codegen. (#383) (@maleadt)

Assets 2

19 Aug 13:09

github-actions

v1.3.0

e48d0dc

v1.3.0

CUDA v1.3.0

Diff since v1.2.1

Closed issues:

Trouble with the @. macro (#346)
NVMLError: Not Supported (code 3) (#348)
Nvidia Xavier devices: exception thrown during kernel execution on device Xavier (#349)
Could not load CUTENSOR artifact dll on Windows 10 (#355)
CuTextureArray for 3D array (#357)
Bug in julia 1.5.0 I have CUDA 11.0 installed in Ubuntu 18.04 (#360)
Callback-based logging (#366)
Artifact download timeout (#369)
sum! accumulates when called multiple times (#370)
nvprof does not detect kernel launches (#371)
KernelError: passing and using non-bitstype argument (#372)
CUDA.jl fails to find libcudadevrt.a due on a cluster install with multi-arch target (#376)

Merged pull requests:

Make the memory allocator context-aware (#253) (@maleadt)
Update manifest (#347) (@github-actions[bot])
Guard against unsupported NVML usage in the test runner. (#352) (@maleadt)
Bump CUDNN to v8.0.2 (#353) (@maleadt)
Rework thread state management (#356) (@maleadt)
Update manifest (#358) (@github-actions[bot])
Memory allocator simplifications (#361) (@maleadt)
Deduplicate code from memory pools (#362) (@maleadt)
Fix show of ArrayBuffer. (#363) (@maleadt)
Clean-up the Buffer interface. (#364) (@maleadt)
Use callback APIs to get library debug logs. (#367) (@maleadt)
Allow selecting the memcheck tool. (#368) (@maleadt)
Update GPUArrays. (#373) (@maleadt)
Update to CUDA 11.0 update 1 (#374) (@maleadt)
Number and iterate devices in versioninfo() following CUDA. (#375) (@maleadt)
Reinstate support for Julia 1.3 (#377) (@maleadt)

Assets 2

31 Jul 08:08

github-actions

v1.2.1

527d364

v1.2.1

CUDA v1.2.1

Diff since v1.2.0

Closed issues:

CuArrays.zeros(T, 0) fails (#81)
CUDAnative.cos calls the base cos function in nested broadcast (#102)
CuSparseMatrixHYB * CuMatrix = nothing (#256)
Strange reordering of struct fields with dynamic parallelism (#263)
Performance: bias add (#298)
CUDA 11 libraries incorrectly looked up in artifact (#300)
CUTENSOR for windows (#301)
Performance: sum (#302)
Performance: getindex(a, i::Array{Int}) (#303)
Display for CuArray within Tuples does not respect :limit=>true (#305)
Performance: elementwise operations (#307)
Performance: perceptron (#312)
windows install error: isfile(__libcupti[]) (#324)
std with dims is not type stable (#336)

Merged pull requests:

Re-enable threading tests. (#25) (@maleadt)
Reorganize and simplify some includes (#296) (@maleadt)
Only run benchmarks on the master branch. (#297) (@maleadt)
Optimizations for broadcast (#299) (@maleadt)
Update manifest (#304) (@github-actions[bot])
Test runner improvements for multigpu mode (#309) (@maleadt)
Artifact improvements for CUDA 11 on Windows (#310) (@maleadt)
Optimize element-wise operations (#313) (@maleadt)
Check if reported GPU memory use is available. (#314) (@maleadt)
Update artifacts: include cusolverMg, and use Yggdrasil binaries. (#315) (@maleadt)
Specialization fixes for mapreducedim. (#316) (@maleadt)
Fix invalid conversion of pointer to signed integer. (#317) (@maleadt)
Work around (presumed) Windows driver bug in exception test. (#319) (@maleadt)
Update manifest (#323) (@github-actions[bot])
Bump CUDNN and CUTENSOR (#325) (@maleadt)
Simplify NVML discovery. (#326) (@maleadt)
Separate CURAND wrappers from Random impl. (#327) (@maleadt)
Simplify discovering binaries by using Sys.which. (#328) (@maleadt)
Add wrapper for NVML utilization rates. (#329) (@maleadt)
Attach CUSPARSE docstrings to bare methods, not empty functions. (#331) (@maleadt)
Eagerly reduce the amount of worker threads. (#332) (@maleadt)
Bump dependencies. (#333) (@maleadt)
Clean-up library wrappers [NFC] (#334) (@maleadt)
Fix CUDNN v8 discovery and loading on Windows (#335) (@maleadt)
Fix type stability of Statistics.var with dims. (#337) (@maleadt)
Fix parameter alignment for dynamic parallelism. (#338) (@maleadt)
Micro-optimize Base.fill. (#339) (@maleadt)

Assets 2

15 Jul 11:07

github-actions

v1.2.0

1c44d7b

v1.2.0

CUDA v1.2.0

Diff since v1.1.0

Closed issues:

Segmentation fault when creating CuArray of CuArray (#133)
CUDNN tests fail with CUDNN 6.0.20 (#134)
CURAND fail to initialize, code 203 (#255)
Deprecation warnings (#277)
Can we pleeeeeeeease make cu(x) eltype preserving? (#278)
On the use of @sync during benchmarking in the documentation (#279)
Example in Multiple GPUs doc fails (#282)
LLVM error: Cannot cast between two non-generic address spaces (#286)

Merged pull requests:

Host-side CUTENSOR (#243) (@kshyatt)
Add and document a non-blocking version of at-sync. (#280) (@maleadt)
Use a custom adaptor for cu so that adapt(CuArray) preserves element types. (#281) (@maleadt)
Check and warn for library versions. (#284) (@maleadt)
Add note about nvml dll missing (#288) (@kshyatt)
Update your PR to have tests pass (#289) (@kshyatt)
Update manifest (#290) (@github-actions[bot])
Support CUDA 11 (#291) (@maleadt)
do not open the file twice when reading the libdevice bitcode (#294) (@jakebolewski)

Assets 2

07 Jul 09:07

github-actions

v1.1.0

1c399bf

v1.1.0

CUDA v1.1.0

Diff since v1.0.2

Closed issues:

Fix NSight detection (#29)
versioninfo() (#34)
throw_... messages: invalid call to jl_alloc_string (#54)
INTERNAL_ERROR during CUDNN handle creation (#183)
Improve benchmarking suite (#222)
How to load CUDA.jl conditional on the computer having a CUDA-compatible GPU? (#237)
CUSOLVER.heevd! returning Float and not Complex (#238)
Broadcasting fails with Float64 -> Int conversion (#240)
Running ] test CUDA with OhMyREPL in startup.jl causes some tests to fail (#246)
ERROR: Your LLVM does not support the NVPTX back-end. in local project environment (#249)
CUDAnative: UndefVarError: AddrSpacePtr not defined on julia master (#250)
Error while freeing CUDA.CuPtr (#254)
Non-artifact initialization of CUDA.jl using CUDA 11 fails on Windows (#262)
Library handle creation close to OOM fails with ERROR_NOT_INITIALIZED (#264)
has(::TargetIterator, name::String) deprecation warning (#271)

Merged pull requests:

Add texture support from CuTextures.jl (#209) (@maleadt)
Memory pinning with interval trees (#233) (@maleadt)
Better nsys detection. (#234) (@maleadt)
CompatHelper: add new compat entry for "IntervalTrees" at version "1.0" (#235) (@github-actions[bot])
Update manifest (#239) (@github-actions[bot])
Replace slash by path separator to properly skip tests on Windows. (#241) (@maleadt)
Retry cudnnCreate on CUDNN_STATUS_INTERNAL_ERROR and CUDNN_STATUS_NOT_INITIALIZED (#244) (@maleadt)
Add issue templates (#245) (@maleadt)
Import wrapper tooling, wrap NVML (#248) (@maleadt)
Ignore some potentially unsupported NVML features. (#251) (@maleadt)
Assert NVPTX availability by just calling the initializer. (#252) (@maleadt)
Update manifest (#257) (@github-actions[bot])
Adapt to AddrSpacePtr rename. (#258) (@maleadt)
Typo in installation overview docs (#260) (@clintonTE)
Update GPUCompiler.jl (#266) (@maleadt)
Retry library initialization failure due to (badly reported) OOM. (#268) (@maleadt)
Upgrade CUTENSOR to v1.1.0. (#269) (@maleadt)
Use CUDNN from Yggdrasil. (#272) (@maleadt)
Update manifest (#273) (@github-actions[bot])
Improve local CUDA discovery for CUDA 11 (#274) (@maleadt)
Compatibility with latest LLVM and GPUCompiler (#275) (@maleadt)

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA v2.0.2

CUDA v2.0.1

CUDA v2.0.0

CUDA v1.3.3

CUDA v1.3.2

CUDA v1.3.1

CUDA v1.3.0

CUDA v1.2.1

CUDA v1.2.0

CUDA v1.1.0

Releases: JuliaGPU/CUDA.jl

v2.0.2

CUDA v2.0.2

v2.0.1

CUDA v2.0.1

v2.0.0

CUDA v2.0.0

v1.3.3

CUDA v1.3.3

v1.3.2

CUDA v1.3.2

v1.3.1

CUDA v1.3.1

v1.3.0

CUDA v1.3.0

v1.2.1

CUDA v1.2.1

v1.2.0

CUDA v1.2.0

v1.1.0

CUDA v1.1.0