Releases: JuliaGPU/CUDA.jl
Releases · JuliaGPU/CUDA.jl
v2.0.2
CUDA v2.0.2
Closed issues:
- cu() behavior for complex floating point numbers (#91)
- Error when following example on using multiple GPUs on multiple processes (#468)
- MacOS without nvidia GPU is trying to download CUDA111 on julia nightly (#469)
- Drop BinaryProvider? (#474)
- Latest version of master doesn't work on Windows (#477)
sum(CUDA.rand(3,3))
broken (#480)- copyto!() between cpu and gpu with subarrays (#491)
Merged pull requests:
- Adapt to GPUCompiler changes. (#458) (@maleadt)
- Fix initialization of global state (#471) (@maleadt)
- Remove 'view' implementation. (#472) (@maleadt)
- Workaround new artifact"" eagerness that prevents loading on unsupported platforms (#473) (@ianshmean)
- Remove BinaryProvider dep. (#475) (@maleadt)
- typo: libcuda.dll -> libcuda.so on Linux (#476) (@Alexander-Barth)
- NFC array simplifications. (#481) (@maleadt)
- Update manifest (#485) (@github-actions[bot])
- Convert AbstractArray{ComplexF64} to CuArray{ComplexF32} by default (#489) (@pabloferz)
v2.0.1
v2.0.0
CUDA v2.0.0
Closed issues:
- Test failure during threading tests (#15)
- Bad allocations in memory pool after device_reset! (#16)
- CuArrays can lose Blas on reshaped views (#78)
- allowscalar performance (#87)
- Indexing with a CuArrays causes a 'scalar indexing disallowed' error from checkbounds (#90)
- 5-arg mul! for CUSPARSE (#98)
- copyto!(Device, Host) uses scalar iteration in case of type mismatch (#105)
- Array primitives broken for CUSPARSE arrays (#113)
- SplittingPool: CPU allocations (#117)
- error while concatenating to an empty CuArray (#139)
- Showing sparse arrays goes wrong (#146)
- Improve test coverage (#147)
- CuArrays allocates a lot of memory on the default GPU (#153)
- [Feature Request] Indexing CuArray with CuArray (#155)
- Reshaping CuArray throws error during backpropagation (#162)
- Match syntax and APIs against Julia 1.0 standard libraries (#163)
- CURAND_STATUS_PREEXISTING_FAILURE when setting seed multiple times. (#212)
- RFC: converts
SparseMatrixCSC
toCuSparseMatrixCSR
viacu
by default (#216) - Add a CuSparseMatrixCOO type (#220)
- Test runner stumbles over path separators (#236)
- Error: Invalid bitcode signature when loading CUDA.jl after precompilation (#293)
- Atomic operations only work on global memory (#311)
- Performance: cudnn algorithm selection (#318)
- CUSPARSE is broken in CUDA.jl 1.2 (#322)
- Device-side broadcast regression on 1.5 (#350)
- API for fast math-like mode (#354)
- CUDA 11.0 Update 1: cublasSetWorkspace (#365)
- Can't precompile CUDA.jl on Kubuntu 20.04 (#396)
- CuPtr should be Ptr in cudnnGetDropoutDescriptor (#397)
- CUDA throws OOM error when initializing API on multiple devices (#398)
- Cannot launch kernel with > 5 args using Dynamic Parallelism (#401)
- Reverse performance regression (#410)
- Tag for LLVM 3? (#412)
- CUDA not working (#415)
StatsBase.transform
fails onCuArray
(#426)- Further unification of
CUBLAS.axpy!
andLinearAlgebra.BLAS.axpy!
(#432) - size(range), length(range) and range[end] fail inside CUDA kernels (#434)
- InitError: Cannot use memory pool 'binned' when CUDA.jl was precompiled for memory pool 'split'. (#446)
- Missing dispatch for matrix multiplication with views? (#448)
- New version not available yet? (#452)
- using CUDA or CUArray, output: UndefVarError: AddrSpacePtr not defined (#457)
- Unable to upgrade to the latest version (#459)
Merged pull requests:
- Performance improvements by calling cuDNN API (#321) (@gartangh)
- Use ccall wrapper for correct pointer type conversions (#392) (@maleadt)
- Simplify Statistics.var and fix dims=tuple. (#393) (@maleadt)
- Adapt to GPUArrays test change. (#394) (@maleadt)
- Default to per-thread stream semantics (#395) (@maleadt)
- Add a missing context argument for stateless codegen. (#399) (@maleadt)
- Keep track of package latency timings. (#400) (@maleadt)
- Update manifest (#402) (@github-actions[bot])
- Latency improvements (#403) (@maleadt)
- Fix bounds checking with GPU views. (#404) (@maleadt)
- Force specialization for dynamic_cudacall to support more arguments. (#407) (@maleadt)
- Fix some wrong pointer types in the CUDNN headers. (#408) (@maleadt)
- Refactor CUSPARSE (#409) (@maleadt)
- Fix typo (#411) (@yixingfu)
- Update manifest (#413) (@github-actions[bot])
- Simplify library wrappers by introducing a CUDA Ref (#414) (@maleadt)
- Simplify and update wrappers (#416) (@maleadt)
- GEMM improvements (#417) (@maleadt)
- CompatHelper: add new compat entry for "BFloat16s" at version "0.1" (#418) (@github-actions[bot])
- add CuSparseMatrixCOO (#421) (@marius311)
- Update manifest (#423) (@github-actions[bot])
- Global math mode for easy use of lower-precision functionality (#424) (@maleadt)
- Improve init error message (#425) (@maleadt)
- CUBLAS: wrap rot! to implement rotate! and reflect! (#427) (@maleadt)
- CUFFT-related optimizations (#428) (@maleadt)
- Fix reverse/view regression (#429) (@maleadt)
- Update packages (#433) (@maleadt)
- Introduce StridedCuArray (#435) (@maleadt)
- Retry curandGenerateSeeds when OOM. (#436) (@maleadt)
- Introduce DenseCuArray union (#437) (@maleadt)
- Array simplifications (#438) (@maleadt)
- Fix and test reverse on wrapped array. (#439) (@maleadt)
- Fixes after recent array wrapper changes (#441) (@maleadt)
- Adapt to GPUArrays changes. (#442) (@maleadt)
- Provide CUBLAS with a pool-backed workspace. (#443) (@maleadt)
- Fix finalization of copied arrays. (#444) (@maleadt)
- Support for/Add CUDA 11.1 (#445) (@maleadt)
- Update manifest (#449) (@github-actions[bot])
- Allow use of strided vectors with mul! (gemv! and gemm!) (#450) (@maleadt)
- Have convert call CuSparseArray's constructors. (#451) (@maleadt)
v1.3.3
v1.3.2
v1.3.1
CUDA v1.3.1
Closed issues:
- Element-wise conversion fails (#378)
- atomic_min fails for Int32 in global CuDeviceArrays (#379)
- Segmentation fault from @cuprint on char (#381)
- error in versioninfo(), name not defined (#385)
Merged pull requests:
- Fix docs (#330) (@maleadt)
- Wrap cusparseSpMV (#351) (@marius311)
- specify Cchar rather than char in the doc for @cuprint (#382) (@MasonProtter)
- Adapt to LLVM.jl changes for stateless codegen. (#383) (@maleadt)
v1.3.0
CUDA v1.3.0
Closed issues:
- Trouble with the @. macro (#346)
- NVMLError: Not Supported (code 3) (#348)
- Nvidia Xavier devices: exception thrown during kernel execution on device Xavier (#349)
- Could not load CUTENSOR artifact dll on Windows 10 (#355)
- CuTextureArray for 3D array (#357)
- Bug in julia 1.5.0 I have CUDA 11.0 installed in Ubuntu 18.04 (#360)
- Callback-based logging (#366)
- Artifact download timeout (#369)
sum!
accumulates when called multiple times (#370)- nvprof does not detect kernel launches (#371)
- KernelError: passing and using non-bitstype argument (#372)
- CUDA.jl fails to find libcudadevrt.a due on a cluster install with multi-arch target (#376)
Merged pull requests:
- Make the memory allocator context-aware (#253) (@maleadt)
- Update manifest (#347) (@github-actions[bot])
- Guard against unsupported NVML usage in the test runner. (#352) (@maleadt)
- Bump CUDNN to v8.0.2 (#353) (@maleadt)
- Rework thread state management (#356) (@maleadt)
- Update manifest (#358) (@github-actions[bot])
- Memory allocator simplifications (#361) (@maleadt)
- Deduplicate code from memory pools (#362) (@maleadt)
- Fix show of ArrayBuffer. (#363) (@maleadt)
- Clean-up the Buffer interface. (#364) (@maleadt)
- Use callback APIs to get library debug logs. (#367) (@maleadt)
- Allow selecting the memcheck tool. (#368) (@maleadt)
- Update GPUArrays. (#373) (@maleadt)
- Update to CUDA 11.0 update 1 (#374) (@maleadt)
- Number and iterate devices in versioninfo() following CUDA. (#375) (@maleadt)
- Reinstate support for Julia 1.3 (#377) (@maleadt)
v1.2.1
CUDA v1.2.1
Closed issues:
- CuArrays.zeros(T, 0) fails (#81)
- CUDAnative.cos calls the base cos function in nested broadcast (#102)
- CuSparseMatrixHYB * CuMatrix = nothing (#256)
- Strange reordering of struct fields with dynamic parallelism (#263)
- Performance: bias add (#298)
- CUDA 11 libraries incorrectly looked up in artifact (#300)
- CUTENSOR for windows (#301)
- Performance: sum (#302)
- Performance: getindex(a, i::Array{Int}) (#303)
- Display for CuArray within Tuples does not respect :limit=>true (#305)
- Performance: elementwise operations (#307)
- Performance: perceptron (#312)
- windows install error: isfile(__libcupti[]) (#324)
- std with dims is not type stable (#336)
Merged pull requests:
- Re-enable threading tests. (#25) (@maleadt)
- Reorganize and simplify some includes (#296) (@maleadt)
- Only run benchmarks on the master branch. (#297) (@maleadt)
- Optimizations for broadcast (#299) (@maleadt)
- Update manifest (#304) (@github-actions[bot])
- Test runner improvements for multigpu mode (#309) (@maleadt)
- Artifact improvements for CUDA 11 on Windows (#310) (@maleadt)
- Optimize element-wise operations (#313) (@maleadt)
- Check if reported GPU memory use is available. (#314) (@maleadt)
- Update artifacts: include cusolverMg, and use Yggdrasil binaries. (#315) (@maleadt)
- Specialization fixes for mapreducedim. (#316) (@maleadt)
- Fix invalid conversion of pointer to signed integer. (#317) (@maleadt)
- Work around (presumed) Windows driver bug in exception test. (#319) (@maleadt)
- Update manifest (#323) (@github-actions[bot])
- Bump CUDNN and CUTENSOR (#325) (@maleadt)
- Simplify NVML discovery. (#326) (@maleadt)
- Separate CURAND wrappers from Random impl. (#327) (@maleadt)
- Simplify discovering binaries by using Sys.which. (#328) (@maleadt)
- Add wrapper for NVML utilization rates. (#329) (@maleadt)
- Attach CUSPARSE docstrings to bare methods, not empty functions. (#331) (@maleadt)
- Eagerly reduce the amount of worker threads. (#332) (@maleadt)
- Bump dependencies. (#333) (@maleadt)
- Clean-up library wrappers [NFC] (#334) (@maleadt)
- Fix CUDNN v8 discovery and loading on Windows (#335) (@maleadt)
- Fix type stability of Statistics.var with dims. (#337) (@maleadt)
- Fix parameter alignment for dynamic parallelism. (#338) (@maleadt)
- Micro-optimize Base.fill. (#339) (@maleadt)
v1.2.0
CUDA v1.2.0
Closed issues:
- Segmentation fault when creating CuArray of CuArray (#133)
- CUDNN tests fail with CUDNN 6.0.20 (#134)
- CURAND fail to initialize, code 203 (#255)
- Deprecation warnings (#277)
- Can we pleeeeeeeease make cu(x) eltype preserving? (#278)
- On the use of @sync during benchmarking in the documentation (#279)
- Example in Multiple GPUs doc fails (#282)
- LLVM error: Cannot cast between two non-generic address spaces (#286)
Merged pull requests:
- Host-side CUTENSOR (#243) (@kshyatt)
- Add and document a non-blocking version of at-sync. (#280) (@maleadt)
- Use a custom adaptor for cu so that adapt(CuArray) preserves element types. (#281) (@maleadt)
- Check and warn for library versions. (#284) (@maleadt)
- Add note about nvml dll missing (#288) (@kshyatt)
- Update your PR to have tests pass (#289) (@kshyatt)
- Update manifest (#290) (@github-actions[bot])
- Support CUDA 11 (#291) (@maleadt)
- do not open the file twice when reading the libdevice bitcode (#294) (@jakebolewski)
v1.1.0
CUDA v1.1.0
Closed issues:
- Fix NSight detection (#29)
- versioninfo() (#34)
- throw_... messages: invalid call to
jl_alloc_string
(#54) - INTERNAL_ERROR during CUDNN handle creation (#183)
- Improve benchmarking suite (#222)
- How to load CUDA.jl conditional on the computer having a CUDA-compatible GPU? (#237)
- CUSOLVER.heevd! returning Float and not Complex (#238)
- Broadcasting fails with Float64 -> Int conversion (#240)
- Running
] test CUDA
withOhMyREPL
instartup.jl
causes some tests to fail (#246) - ERROR: Your LLVM does not support the NVPTX back-end. in local project environment (#249)
- CUDAnative: UndefVarError: AddrSpacePtr not defined on julia master (#250)
- Error while freeing CUDA.CuPtr (#254)
- Non-artifact initialization of CUDA.jl using CUDA 11 fails on Windows (#262)
- Library handle creation close to OOM fails with ERROR_NOT_INITIALIZED (#264)
- has(::TargetIterator, name::String) deprecation warning (#271)
Merged pull requests:
- Add texture support from CuTextures.jl (#209) (@maleadt)
- Memory pinning with interval trees (#233) (@maleadt)
- Better nsys detection. (#234) (@maleadt)
- CompatHelper: add new compat entry for "IntervalTrees" at version "1.0" (#235) (@github-actions[bot])
- Update manifest (#239) (@github-actions[bot])
- Replace slash by path separator to properly skip tests on Windows. (#241) (@maleadt)
- Retry cudnnCreate on CUDNN_STATUS_INTERNAL_ERROR and CUDNN_STATUS_NOT_INITIALIZED (#244) (@maleadt)
- Add issue templates (#245) (@maleadt)
- Import wrapper tooling, wrap NVML (#248) (@maleadt)
- Ignore some potentially unsupported NVML features. (#251) (@maleadt)
- Assert NVPTX availability by just calling the initializer. (#252) (@maleadt)
- Update manifest (#257) (@github-actions[bot])
- Adapt to AddrSpacePtr rename. (#258) (@maleadt)
- Typo in installation overview docs (#260) (@clintonTE)
- Update GPUCompiler.jl (#266) (@maleadt)
- Retry library initialization failure due to (badly reported) OOM. (#268) (@maleadt)
- Upgrade CUTENSOR to v1.1.0. (#269) (@maleadt)
- Use CUDNN from Yggdrasil. (#272) (@maleadt)
- Update manifest (#273) (@github-actions[bot])
- Improve local CUDA discovery for CUDA 11 (#274) (@maleadt)
- Compatibility with latest LLVM and GPUCompiler (#275) (@maleadt)