Audit uses of 32-bit indexing #1968

maleadt · 2023-06-16T08:43:39Z

We're currently using Int32 indices in some kernels, using the i32 hack, because that often results in significantly better performance. However, GPUs are getting large, and users are starting to use arrays that overflow typemax(Int32) elements. This can results in bugs like #1963

We should be more careful about using 32-bit indexing, and probably not use i32 until we have a better way of deciding which index type to use. Maybe we can add some kind of index_type trait, defaulting to Int but possibly using Int32 when the input arrays allow it, e.g., using #1895.

The text was updated successfully, but these errors were encountered:

mtanneau · 2025-02-22T01:26:35Z

Dear CUDA.jl team, I would like to bump this issue.
The last couple generations of GPUs (e.g. L40S, H100 and H200) have enough memory that they can handle >2B arrays.

Error 1 (broadcasting)

julia> using CUDA
julia> A = CUDA.fill(1f0, 2^32); A .= 2f0
ERROR: InexactError: trunc(Int32, 4294967296)
Stacktrace:
  [1] throw_inexacterror(::Symbol, ::Vararg{Any})
    @ Core ./boot.jl:750
  [2] checked_trunc_sint
    @ ./boot.jl:764 [inlined]
  [3] toInt32
    @ ./boot.jl:801 [inlined]
  [4] Int32
    @ ./boot.jl:891 [inlined]
  [5] convert
    @ ./number.jl:7 [inlined]
  [6] cconvert
    @ ./essentials.jl:687 [inlined]
  [7] macro expansion
    @ ~/.julia/packages/CUDA/1kIOw/lib/utils/call.jl:222 [inlined]
  [8] macro expansion
    @ ~/.julia/packages/CUDA/1kIOw/lib/cudadrv/libcuda.jl:5139 [inlined]
  [9] #735
    @ ~/.julia/packages/CUDA/1kIOw/lib/utils/call.jl:35 [inlined]
 [10] check
    @ ~/.julia/packages/CUDA/1kIOw/lib/cudadrv/libcuda.jl:35 [inlined]
 [11] cuOccupancyMaxPotentialBlockSize
    @ ~/.julia/packages/CUDA/1kIOw/lib/utils/call.jl:34 [inlined]
 [12] launch_configuration(fun::CuFunction; shmem::Int64, max_threads::Int64)
    @ CUDA ~/.julia/packages/CUDA/1kIOw/lib/cudadrv/occupancy.jl:61
 [13] launch_configuration
    @ ~/.julia/packages/CUDA/1kIOw/lib/cudadrv/occupancy.jl:56 [inlined]
 [14] (::KernelAbstractions.Kernel{…})(::CuArray{…}, ::Vararg{…}; ndrange::Tuple{…}, workgroupsize::Nothing)
    @ CUDA.CUDAKernels ~/.julia/packages/CUDA/1kIOw/src/CUDAKernels.jl:107

Error 2 (filling a large array, no explicit broadcasting)

julia> A = CUDA.fill(true, 2^32);
ERROR: InexactError: trunc(Int32, 4294967296)
Stacktrace:
  [1] throw_inexacterror(::Symbol, ::Vararg{Any})
    @ Core ./boot.jl:750
  [2] checked_trunc_sint
    @ ./boot.jl:764 [inlined]
  [3] toInt32
    @ ./boot.jl:801 [inlined]
  [4] Int32
    @ ./boot.jl:891 [inlined]
  [5] convert
    @ ./number.jl:7 [inlined]
  [6] cconvert
    @ ./essentials.jl:687 [inlined]
  [7] macro expansion
    @ ~/.julia/packages/CUDA/1kIOw/lib/utils/call.jl:222 [inlined]
  [8] macro expansion
    @ ~/.julia/packages/CUDA/1kIOw/lib/cudadrv/libcuda.jl:5139 [inlined]
  [9] #735
    @ ~/.julia/packages/CUDA/1kIOw/lib/utils/call.jl:35 [inlined]
 [10] check
    @ ~/.julia/packages/CUDA/1kIOw/lib/cudadrv/libcuda.jl:35 [inlined]
 [11] cuOccupancyMaxPotentialBlockSize
    @ ~/.julia/packages/CUDA/1kIOw/lib/utils/call.jl:34 [inlined]
 [12] launch_configuration(fun::CuFunction; shmem::Int64, max_threads::Int64)
    @ CUDA ~/.julia/packages/CUDA/1kIOw/lib/cudadrv/occupancy.jl:61
 [13] launch_configuration
    @ ~/.julia/packages/CUDA/1kIOw/lib/cudadrv/occupancy.jl:56 [inlined]
 [14] (::KernelAbstractions.Kernel{…})(::CuArray{…}, ::Vararg{…}; ndrange::Tuple{…}, workgroupsize::Nothing)
    @ CUDA.CUDAKernels ~/.julia/packages/CUDA/1kIOw/src/CUDAKernels.jl:107
 [15] fill!(A::CuArray{Bool, 1, CUDA.DeviceMemory}, x::Bool)
    @ GPUArrays ~/.julia/packages/GPUArrays/uiVyU/src/host/construction.jl:22
 [16] fill
    @ ~/.julia/packages/CUDA/1kIOw/src/array.jl:777 [inlined]
 [17] macro expansion
    @ ~/.julia/packages/CUDA/1kIOw/src/utilities.jl:35 [inlined]
 [18] macro expansion
    @ ~/.julia/packages/CUDA/1kIOw/src/memory.jl:831 [inlined]
 [19] top-level scope
    @ ./REPL[114]:1
Some type information was truncated. Use `show(err)` to see complete types.

EDIT: I believe this was fixed a couple of days ago, I'll wait for the next release and re-run my code.

maleadt · 2025-03-03T10:33:22Z

As you noted, those issues are unrelated, and are fixed on the master branch.

maleadt added the bug Something isn't working label Jun 16, 2023

maleadt mentioned this issue Jun 16, 2023

Strange rand errors when sampling large matrices #1963

Closed

b-fg mentioned this issue Feb 14, 2025

Broadcasting on arrays larger than typemax(Int32) yields truncation error #2658

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audit uses of 32-bit indexing #1968

Audit uses of 32-bit indexing #1968

maleadt commented Jun 16, 2023

mtanneau commented Feb 22, 2025 •

edited

Loading

maleadt commented Mar 3, 2025

Audit uses of 32-bit indexing #1968

Audit uses of 32-bit indexing #1968

Comments

maleadt commented Jun 16, 2023

mtanneau commented Feb 22, 2025 • edited Loading

maleadt commented Mar 3, 2025

mtanneau commented Feb 22, 2025 •

edited

Loading