Skip to content

Releases: JuliaGPU/CUDA.jl

v2.0.2

15 Oct 14:14
Compare
Choose a tag to compare

CUDA v2.0.2

Diff since v2.0.1

Closed issues:

  • cu() behavior for complex floating point numbers (#91)
  • Error when following example on using multiple GPUs on multiple processes (#468)
  • MacOS without nvidia GPU is trying to download CUDA111 on julia nightly (#469)
  • Drop BinaryProvider? (#474)
  • Latest version of master doesn't work on Windows (#477)
  • sum(CUDA.rand(3,3)) broken (#480)
  • copyto!() between cpu and gpu with subarrays (#491)

Merged pull requests:

v2.0.1

05 Oct 08:12
Compare
Choose a tag to compare

CUDA v2.0.1

Diff since v2.0.0

Closed issues:

  • Can't update (#462)

Merged pull requests:

  • Remove duplicate comment (#464) (@blegat)
  • Add functionality to precompile the runtime library. (#465) (@maleadt)
  • Update manifest (#470) (@github-actions[bot])

v2.0.0

02 Oct 07:12
70d93cc
Compare
Choose a tag to compare

CUDA v2.0.0

Diff since v1.3.3

Closed issues:

  • Test failure during threading tests (#15)
  • Bad allocations in memory pool after device_reset! (#16)
  • CuArrays can lose Blas on reshaped views (#78)
  • allowscalar performance (#87)
  • Indexing with a CuArrays causes a 'scalar indexing disallowed' error from checkbounds (#90)
  • 5-arg mul! for CUSPARSE (#98)
  • copyto!(Device, Host) uses scalar iteration in case of type mismatch (#105)
  • Array primitives broken for CUSPARSE arrays (#113)
  • SplittingPool: CPU allocations (#117)
  • error while concatenating to an empty CuArray (#139)
  • Showing sparse arrays goes wrong (#146)
  • Improve test coverage (#147)
  • CuArrays allocates a lot of memory on the default GPU (#153)
  • [Feature Request] Indexing CuArray with CuArray (#155)
  • Reshaping CuArray throws error during backpropagation (#162)
  • Match syntax and APIs against Julia 1.0 standard libraries (#163)
  • CURAND_STATUS_PREEXISTING_FAILURE when setting seed multiple times. (#212)
  • RFC: converts SparseMatrixCSC to CuSparseMatrixCSR via cu by default (#216)
  • Add a CuSparseMatrixCOO type (#220)
  • Test runner stumbles over path separators (#236)
  • Error: Invalid bitcode signature when loading CUDA.jl after precompilation (#293)
  • Atomic operations only work on global memory (#311)
  • Performance: cudnn algorithm selection (#318)
  • CUSPARSE is broken in CUDA.jl 1.2 (#322)
  • Device-side broadcast regression on 1.5 (#350)
  • API for fast math-like mode (#354)
  • CUDA 11.0 Update 1: cublasSetWorkspace (#365)
  • Can't precompile CUDA.jl on Kubuntu 20.04 (#396)
  • CuPtr should be Ptr in cudnnGetDropoutDescriptor (#397)
  • CUDA throws OOM error when initializing API on multiple devices (#398)
  • Cannot launch kernel with > 5 args using Dynamic Parallelism (#401)
  • Reverse performance regression (#410)
  • Tag for LLVM 3? (#412)
  • CUDA not working (#415)
  • StatsBase.transform fails on CuArray (#426)
  • Further unification of CUBLAS.axpy! and LinearAlgebra.BLAS.axpy! (#432)
  • size(range), length(range) and range[end] fail inside CUDA kernels (#434)
  • InitError: Cannot use memory pool 'binned' when CUDA.jl was precompiled for memory pool 'split'. (#446)
  • Missing dispatch for matrix multiplication with views? (#448)
  • New version not available yet? (#452)
  • using CUDA or CUArray, output: UndefVarError: AddrSpacePtr not defined (#457)
  • Unable to upgrade to the latest version (#459)

Merged pull requests:

v1.3.3

25 Aug 11:08
be21077
Compare
Choose a tag to compare

CUDA v1.3.3

Diff since v1.3.2

Closed issues:

  • Type changing Array conversions give error when allowscalar(false) (#344)
  • getindex(::CuArray, ::Adjoint, ::Colon) fails (#345)
  • View with array indices causes memory copy before broadcast (#384)
  • Regression with Julia 1.5 (#390)

Merged pull requests:

v1.3.2

24 Aug 07:09
Compare
Choose a tag to compare

CUDA v1.3.2

Diff since v1.3.1

Closed issues:

  • LLVM WMMA errors (#380)

Merged pull requests:

  • Fix handling of tests to skip. (#386) (@maleadt)
  • Update manifest (#387) (@github-actions[bot])

v1.3.1

22 Aug 07:11
Compare
Choose a tag to compare

CUDA v1.3.1

Diff since v1.3.0

Closed issues:

  • Element-wise conversion fails (#378)
  • atomic_min fails for Int32 in global CuDeviceArrays (#379)
  • Segmentation fault from @cuprint on char (#381)
  • error in versioninfo(), name not defined (#385)

Merged pull requests:

v1.3.0

19 Aug 13:09
e48d0dc
Compare
Choose a tag to compare

CUDA v1.3.0

Diff since v1.2.1

Closed issues:

  • Trouble with the @. macro (#346)
  • NVMLError: Not Supported (code 3) (#348)
  • Nvidia Xavier devices: exception thrown during kernel execution on device Xavier (#349)
  • Could not load CUTENSOR artifact dll on Windows 10 (#355)
  • CuTextureArray for 3D array (#357)
  • Bug in julia 1.5.0 I have CUDA 11.0 installed in Ubuntu 18.04 (#360)
  • Callback-based logging (#366)
  • Artifact download timeout (#369)
  • sum! accumulates when called multiple times (#370)
  • nvprof does not detect kernel launches (#371)
  • KernelError: passing and using non-bitstype argument (#372)
  • CUDA.jl fails to find libcudadevrt.a due on a cluster install with multi-arch target (#376)

Merged pull requests:

v1.2.1

31 Jul 08:08
527d364
Compare
Choose a tag to compare

CUDA v1.2.1

Diff since v1.2.0

Closed issues:

  • CuArrays.zeros(T, 0) fails (#81)
  • CUDAnative.cos calls the base cos function in nested broadcast (#102)
  • CuSparseMatrixHYB * CuMatrix = nothing (#256)
  • Strange reordering of struct fields with dynamic parallelism (#263)
  • Performance: bias add (#298)
  • CUDA 11 libraries incorrectly looked up in artifact (#300)
  • CUTENSOR for windows (#301)
  • Performance: sum (#302)
  • Performance: getindex(a, i::Array{Int}) (#303)
  • Display for CuArray within Tuples does not respect :limit=>true (#305)
  • Performance: elementwise operations (#307)
  • Performance: perceptron (#312)
  • windows install error: isfile(__libcupti[]) (#324)
  • std with dims is not type stable (#336)

Merged pull requests:

  • Re-enable threading tests. (#25) (@maleadt)
  • Reorganize and simplify some includes (#296) (@maleadt)
  • Only run benchmarks on the master branch. (#297) (@maleadt)
  • Optimizations for broadcast (#299) (@maleadt)
  • Update manifest (#304) (@github-actions[bot])
  • Test runner improvements for multigpu mode (#309) (@maleadt)
  • Artifact improvements for CUDA 11 on Windows (#310) (@maleadt)
  • Optimize element-wise operations (#313) (@maleadt)
  • Check if reported GPU memory use is available. (#314) (@maleadt)
  • Update artifacts: include cusolverMg, and use Yggdrasil binaries. (#315) (@maleadt)
  • Specialization fixes for mapreducedim. (#316) (@maleadt)
  • Fix invalid conversion of pointer to signed integer. (#317) (@maleadt)
  • Work around (presumed) Windows driver bug in exception test. (#319) (@maleadt)
  • Update manifest (#323) (@github-actions[bot])
  • Bump CUDNN and CUTENSOR (#325) (@maleadt)
  • Simplify NVML discovery. (#326) (@maleadt)
  • Separate CURAND wrappers from Random impl. (#327) (@maleadt)
  • Simplify discovering binaries by using Sys.which. (#328) (@maleadt)
  • Add wrapper for NVML utilization rates. (#329) (@maleadt)
  • Attach CUSPARSE docstrings to bare methods, not empty functions. (#331) (@maleadt)
  • Eagerly reduce the amount of worker threads. (#332) (@maleadt)
  • Bump dependencies. (#333) (@maleadt)
  • Clean-up library wrappers [NFC] (#334) (@maleadt)
  • Fix CUDNN v8 discovery and loading on Windows (#335) (@maleadt)
  • Fix type stability of Statistics.var with dims. (#337) (@maleadt)
  • Fix parameter alignment for dynamic parallelism. (#338) (@maleadt)
  • Micro-optimize Base.fill. (#339) (@maleadt)

v1.2.0

15 Jul 11:07
1c44d7b
Compare
Choose a tag to compare

CUDA v1.2.0

Diff since v1.1.0

Closed issues:

  • Segmentation fault when creating CuArray of CuArray (#133)
  • CUDNN tests fail with CUDNN 6.0.20 (#134)
  • CURAND fail to initialize, code 203 (#255)
  • Deprecation warnings (#277)
  • Can we pleeeeeeeease make cu(x) eltype preserving? (#278)
  • On the use of @sync during benchmarking in the documentation (#279)
  • Example in Multiple GPUs doc fails (#282)
  • LLVM error: Cannot cast between two non-generic address spaces (#286)

Merged pull requests:

v1.1.0

07 Jul 09:07
1c399bf
Compare
Choose a tag to compare

CUDA v1.1.0

Diff since v1.0.2

Closed issues:

  • Fix NSight detection (#29)
  • versioninfo() (#34)
  • throw_... messages: invalid call to jl_alloc_string (#54)
  • INTERNAL_ERROR during CUDNN handle creation (#183)
  • Improve benchmarking suite (#222)
  • How to load CUDA.jl conditional on the computer having a CUDA-compatible GPU? (#237)
  • CUSOLVER.heevd! returning Float and not Complex (#238)
  • Broadcasting fails with Float64 -> Int conversion (#240)
  • Running ] test CUDA with OhMyREPL in startup.jl causes some tests to fail (#246)
  • ERROR: Your LLVM does not support the NVPTX back-end. in local project environment (#249)
  • CUDAnative: UndefVarError: AddrSpacePtr not defined on julia master (#250)
  • Error while freeing CUDA.CuPtr (#254)
  • Non-artifact initialization of CUDA.jl using CUDA 11 fails on Windows (#262)
  • Library handle creation close to OOM fails with ERROR_NOT_INITIALIZED (#264)
  • has(::TargetIterator, name::String) deprecation warning (#271)

Merged pull requests:

  • Add texture support from CuTextures.jl (#209) (@maleadt)
  • Memory pinning with interval trees (#233) (@maleadt)
  • Better nsys detection. (#234) (@maleadt)
  • CompatHelper: add new compat entry for "IntervalTrees" at version "1.0" (#235) (@github-actions[bot])
  • Update manifest (#239) (@github-actions[bot])
  • Replace slash by path separator to properly skip tests on Windows. (#241) (@maleadt)
  • Retry cudnnCreate on CUDNN_STATUS_INTERNAL_ERROR and CUDNN_STATUS_NOT_INITIALIZED (#244) (@maleadt)
  • Add issue templates (#245) (@maleadt)
  • Import wrapper tooling, wrap NVML (#248) (@maleadt)
  • Ignore some potentially unsupported NVML features. (#251) (@maleadt)
  • Assert NVPTX availability by just calling the initializer. (#252) (@maleadt)
  • Update manifest (#257) (@github-actions[bot])
  • Adapt to AddrSpacePtr rename. (#258) (@maleadt)
  • Typo in installation overview docs (#260) (@clintonTE)
  • Update GPUCompiler.jl (#266) (@maleadt)
  • Retry library initialization failure due to (badly reported) OOM. (#268) (@maleadt)
  • Upgrade CUTENSOR to v1.1.0. (#269) (@maleadt)
  • Use CUDNN from Yggdrasil. (#272) (@maleadt)
  • Update manifest (#273) (@github-actions[bot])
  • Improve local CUDA discovery for CUDA 11 (#274) (@maleadt)
  • Compatibility with latest LLVM and GPUCompiler (#275) (@maleadt)