Releases: JuliaGPU/CUDA.jl
Releases · JuliaGPU/CUDA.jl
v3.10.0
CUDA v3.10.0
Closed issues:
Error while freeing DeviceBuffer
-warning when using multiple GPUs (#1454)- CUDNN cache locking prevents finalizers resulting in OOMs (#1461)
- EOFError from pool_cleanup when closing REPL (#1495)
- TypeError in compiler with custom kernel (#1496)
Merged pull requests:
- expose sparse mv/mm algo selection (#1201) (@Roger-luo)
- Always inspect the task-local context when verifying before freeing. (#1462) (@maleadt)
- support sparse opnorm (#1466) (@Roger-luo)
- Move CUSTATEVEC and CUTENSORNET into lib/ (#1478) (@vchuravy)
- Adapt to GPUCompiler 0.15 changes (#1488) (@maleadt)
- Limit time held by CUDNN locks. (#1491) (@maleadt)
- Docstring for
cu
(#1493) (@mcabbott) - Update manifest (#1499) (@github-actions[bot])
- Silence EOFError in pool_cleanup (#1502) (@Octogonapus)
- Adapt to GPUCompiler changes (#1504) (@maleadt)
- Fixes for CUSPARSE 11.7.1. (#1505) (@maleadt)
- Update artifacts (#1507) (@maleadt)
- Update manifest (#1509) (@github-actions[bot])
- Add a new cache for HostKernel objects. (#1510) (@maleadt)
v3.9.1
CUDA v3.9.1
Closed issues:
- Issue with copy_cublasfloat (#1476)
- Errors when broadcasting random number generators (#1480)
- CPU version of linear algebra routine is dispatched when using
Zygote.gradient
(#1481) scan!
fails on vectors of structs (#1482)- InexactError when getting CUDA version info (#1489)
Merged pull requests:
- Allow more integer argument types for byte_perm (#1420) (@eschnett)
- support CuSparseMatrix(::Diagonal) (#1470) (@Roger-luo)
- Don't emit debug info until the next CUDA version. (#1473) (@maleadt)
- Update manifest (#1474) (@github-actions[bot])
- Update manifest (#1479) (@github-actions[bot])
- fix unsafe_wrap docstring and widen signature (#1483) (@piever)
- Update manifest (#1484) (@github-actions[bot])
- Check whether cudaRuntimeGetVersion succeeded. (#1490) (@maleadt)
- Update manifest (#1494) (@github-actions[bot])
- Fix #1476: Allow any container in copy_cublasfloat (#1498) (@danielwe)
v3.9.0
CUDA v3.9.0
Closed issues:
- Tests for showing (#35)
- Support LU factorizations (#1193)
- Int8 WMMA not working in 3.8.4 and 3.8.5 despite merged PR. Add more unit tests? (#1442)
- Optional CPU cpu kernel call with @cuda (#1443)
- Add library/artifact management for NCCL (#1446)
- permutedims returns a lowertriangular matrix (#1451)
- New broadcast corrupts memory? (#1457)
- norm does not dispatch on CuSparseMatrixCSC (#1460)
- scalar * sparse multiplication (#1468)
Merged pull requests:
- CUTENSOR: axpy! and axpby! not mutating fixed (#1416) (@yapanuwan)
- Initial wrap of cuquantum (#1437) (@kshyatt)
- CompatHelper: bump compat for "GPUCompiler" to "0.14" (#1441) (@github-actions[bot])
- Fix return type of nrm2 for ComplexF16 (#1444) (@danielwe)
- Use a build matrix. (#1445) (@maleadt)
- Update manifest (#1447) (@github-actions[bot])
- Rework factorizations (#1449) (@maleadt)
- Add NCCL binaries. (#1450) (@maleadt)
- Support general eltypes in matrix division and SVD (#1453) (@danielwe)
- Update manifest (#1456) (@github-actions[bot])
- Look at more environment variables to find nsys. (#1459) (@maleadt)
- Fixes for 1.8 (#1463) (@maleadt)
v3.8.5
v3.8.4
CUDA v3.8.4
Closed issues:
- sparse-sparse and sparse-constant multiplication lose sparsity (output dense matrix) (#1264)
- LLVMExtra fails to load on Julia 1.8 and PPC (#1387)
- compute-sanitizer CUDA_ERROR_INVALID_VALUE on CUDA.jl 3.0+ (#1415)
@cudnnDescriptor
is not threadsafe (#1421)- Precomplication of CUDA 3.8.3 broken on 1.7.1 due to changes in Random123.jl (#1422)
- OOM error should include memory status (#1427)
- WMMA kernel works with Julia 1.7.2 but fails with
illegal memory access
for Julia 1.8.0-beta1 (#1431) - Non Int64 local memory size leads to dynamic function invocation (#1434)
- "initialization" test failing (#1435)
- cuda with julia 1.8 not working on windows (working fine(?) on wsl2) (#1436)
Merged pull requests:
- Add Int8 WMMA Support (#1119) (@max-Hawkins)
- Wrap generic sparse-sparse GEMM (#1285) (@kshyatt)
- Fix sparse COO to CSR conversion. (#1412) (@maleadt)
- Drop support for CUDA 10.1 and below (#1414) (@maleadt)
- Update manifest (#1417) (@github-actions[bot])
- Report the OOM memory status at the time of the error. (#1428) (@maleadt)
- Lock CUDNN descriptor cache lookups. (#1430) (@maleadt)
- Switch to new LLVM context management for 1.9 compatibility. (#1432) (@maleadt)
- Update manifest (#1433) (@github-actions[bot])
- Backports for 3.8.4 (#1438) (@maleadt)
v3.8.3
CUDA v3.8.3
Closed issues:
- Sparse matrix addition not working (#528)
- Native implementation of sparse arrays (#829)
- CUSPARSE: Adding a value to the diagonal (#1372)
- Conversion by
cu
casts Float64 to Float32 but not Int64 to Int32 (#1388) CUDA.math_mode!(...; precision)
option not working (#1392)cuIpcGetMemHandle
failure resulting in CUDA-aware MPI to fail (#1398)- axpby! support for BFloat16 (#1399)
- CUSPARSE does not support integer matrices, breaks printing (#1402)
sparse(I, J, V)
doesn't support unsorted inputs (#1407)
Merged pull requests:
- General purpose broadcast for sparse CSR matrices. (#1380) (@maleadt)
- Update manifest (#1389) (@github-actions[bot])
- Implement sparse operations with UniformScaling using broadcast. (#1390) (@maleadt)
- Prevent toplevel compilation. (#1391) (@maleadt)
- Fix and test math precision. (#1394) (@maleadt)
- Bump artifacts (#1397) (@maleadt)
- support BFloat16 for atomic_cas (#1400) (@bjarthur)
- Implement sparse broadcasting with CSC matrices. (#1401) (@maleadt)
- Always report issues with discovering CUDA. (#1404) (@maleadt)
- Fix sparse 1-argument broadcast output type. (#1405) (@maleadt)
- CUSPARSE BSR improvements (#1409) (@maleadt)
- Support limited sparse integer arrays by bitcasting to floating point. (#1410) (@maleadt)
- Support using sparse with unsorted inputs. (#1411) (@maleadt)
- Backports for 3.8.3 (#1413) (@maleadt)
v3.8.2
v3.8.1
CUDA v3.8.1
Closed issues:
one(::CuMatrix)
result on cpu (#142)- Broadcasted setindex! triggers scalar setindex! (#101)
- OutOfGPUMemoryError With Available Memory (#1346)
- Distributions.jl with CuArrays (#1347)
- Views of Flux OneHotArrays (#1349)
- synchronize(blocking = false) hangs in julia 1.7 eventually (#1350)
- unsupported call through a literal pointer (call to log1pf) on Julia 1.6.5 (#1352)
- SpecialFunctions ^1.8 compat entry? (#1354)
- Performance deprecation using
^
on Float32 (#1358) - Method definition setindex!(LinearAlgebra.Diagonal{T, V} ... overwritten in module CUDA (#1364)
- [PackageCompiler] Segmentation fault with CUDA.jl in multiversioning (#1365)
- Vectors in customary structs make julia stuck (#1366)
- sparseCSC-dense matrix multiplication yields unstable results (#1368)
- UndefVarError: parameters not defined on Windows10 (#1371)
Merged pull requests:
- Optimize memoization helpers. (#1345) (@maleadt)
- Update manifest (#1348) (@github-actions[bot])
- Update manifest (#1355) (@github-actions[bot])
- Fastmath improvements (#1356) (@maleadt)
- Make the default pool visible when doing P2P (#1357) (@maleadt)
- Fix resize of empty arrays. (#1359) (@maleadt)
- CUSPARSE: add COO ctors and similar with eltype. (#1360) (@maleadt)
- Add device_override for SpecialFunctions.gamma (#1361) (@vchuravy)
- Implement (limited) broadcast of sparse arrays (#1367) (@maleadt)
- Make nonblocking synchronization robust to errors. (#1369) (@maleadt)
- Update manifest (#1370) (@github-actions[bot])
- Backports for 3.8.1 (#1374) (@maleadt)
v3.8.0
v3.7.1
CUDA v3.7.1
Closed issues:
- Moving data between devices (#1136)
- Repeated has_cuda_gpu errors when CUDA_VISIBLE_DEVICES is empty (#1331)
- Error when env var CUDA_VISIBLE_DEVICES is set but empty (#1336)
Merged pull requests:
- Wrap and test peer to peer memory copies (#1284) (@kshyatt)
- Update manifest (#1332) (@github-actions[bot])
- Have libcuda() fail repeatedly if anything (e.g. init) failed. (#1333) (@maleadt)
- Simplify workarounds. (#1334) (@maleadt)
- Properly detect a missing driver. (#1335) (@maleadt)
- Various small fixes (#1337) (@maleadt)
- Move CUDA.jl global state innto CUDAdrv wrapper "submodule" (#1338) (@maleadt)
- Add
CUDA.return_type
(#1339) (@tkf) - Compute-sanitizer QOL improvements and docs (#1340) (@maleadt)
- Fix regression in backwards CUFFT plans. (#1341) (@maleadt)
- Don't assume host pointers are directly usable on the device. (#1342) (@maleadt)
- Backports for 3.7.1 (#1343) (@maleadt)