Skip to content

Audit uses of 32-bit indexing #1968

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
maleadt opened this issue Jun 16, 2023 · 2 comments
Open

Audit uses of 32-bit indexing #1968

maleadt opened this issue Jun 16, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@maleadt
Copy link
Member

maleadt commented Jun 16, 2023

We're currently using Int32 indices in some kernels, using the i32 hack, because that often results in significantly better performance. However, GPUs are getting large, and users are starting to use arrays that overflow typemax(Int32) elements. This can results in bugs like #1963

We should be more careful about using 32-bit indexing, and probably not use i32 until we have a better way of deciding which index type to use. Maybe we can add some kind of index_type trait, defaulting to Int but possibly using Int32 when the input arrays allow it, e.g., using #1895.

@mtanneau
Copy link

mtanneau commented Feb 22, 2025

Dear CUDA.jl team, I would like to bump this issue.
The last couple generations of GPUs (e.g. L40S, H100 and H200) have enough memory that they can handle >2B arrays.

Error 1 (broadcasting)
julia> using CUDA
julia> A = CUDA.fill(1f0, 2^32); A .= 2f0
ERROR: InexactError: trunc(Int32, 4294967296)
Stacktrace:
  [1] throw_inexacterror(::Symbol, ::Vararg{Any})
    @ Core ./boot.jl:750
  [2] checked_trunc_sint
    @ ./boot.jl:764 [inlined]
  [3] toInt32
    @ ./boot.jl:801 [inlined]
  [4] Int32
    @ ./boot.jl:891 [inlined]
  [5] convert
    @ ./number.jl:7 [inlined]
  [6] cconvert
    @ ./essentials.jl:687 [inlined]
  [7] macro expansion
    @ ~/.julia/packages/CUDA/1kIOw/lib/utils/call.jl:222 [inlined]
  [8] macro expansion
    @ ~/.julia/packages/CUDA/1kIOw/lib/cudadrv/libcuda.jl:5139 [inlined]
  [9] #735
    @ ~/.julia/packages/CUDA/1kIOw/lib/utils/call.jl:35 [inlined]
 [10] check
    @ ~/.julia/packages/CUDA/1kIOw/lib/cudadrv/libcuda.jl:35 [inlined]
 [11] cuOccupancyMaxPotentialBlockSize
    @ ~/.julia/packages/CUDA/1kIOw/lib/utils/call.jl:34 [inlined]
 [12] launch_configuration(fun::CuFunction; shmem::Int64, max_threads::Int64)
    @ CUDA ~/.julia/packages/CUDA/1kIOw/lib/cudadrv/occupancy.jl:61
 [13] launch_configuration
    @ ~/.julia/packages/CUDA/1kIOw/lib/cudadrv/occupancy.jl:56 [inlined]
 [14] (::KernelAbstractions.Kernel{…})(::CuArray{…}, ::Vararg{…}; ndrange::Tuple{…}, workgroupsize::Nothing)
    @ CUDA.CUDAKernels ~/.julia/packages/CUDA/1kIOw/src/CUDAKernels.jl:107
Error 2 (filling a large array, no explicit broadcasting)
julia> A = CUDA.fill(true, 2^32);
ERROR: InexactError: trunc(Int32, 4294967296)
Stacktrace:
  [1] throw_inexacterror(::Symbol, ::Vararg{Any})
    @ Core ./boot.jl:750
  [2] checked_trunc_sint
    @ ./boot.jl:764 [inlined]
  [3] toInt32
    @ ./boot.jl:801 [inlined]
  [4] Int32
    @ ./boot.jl:891 [inlined]
  [5] convert
    @ ./number.jl:7 [inlined]
  [6] cconvert
    @ ./essentials.jl:687 [inlined]
  [7] macro expansion
    @ ~/.julia/packages/CUDA/1kIOw/lib/utils/call.jl:222 [inlined]
  [8] macro expansion
    @ ~/.julia/packages/CUDA/1kIOw/lib/cudadrv/libcuda.jl:5139 [inlined]
  [9] #735
    @ ~/.julia/packages/CUDA/1kIOw/lib/utils/call.jl:35 [inlined]
 [10] check
    @ ~/.julia/packages/CUDA/1kIOw/lib/cudadrv/libcuda.jl:35 [inlined]
 [11] cuOccupancyMaxPotentialBlockSize
    @ ~/.julia/packages/CUDA/1kIOw/lib/utils/call.jl:34 [inlined]
 [12] launch_configuration(fun::CuFunction; shmem::Int64, max_threads::Int64)
    @ CUDA ~/.julia/packages/CUDA/1kIOw/lib/cudadrv/occupancy.jl:61
 [13] launch_configuration
    @ ~/.julia/packages/CUDA/1kIOw/lib/cudadrv/occupancy.jl:56 [inlined]
 [14] (::KernelAbstractions.Kernel{…})(::CuArray{…}, ::Vararg{…}; ndrange::Tuple{…}, workgroupsize::Nothing)
    @ CUDA.CUDAKernels ~/.julia/packages/CUDA/1kIOw/src/CUDAKernels.jl:107
 [15] fill!(A::CuArray{Bool, 1, CUDA.DeviceMemory}, x::Bool)
    @ GPUArrays ~/.julia/packages/GPUArrays/uiVyU/src/host/construction.jl:22
 [16] fill
    @ ~/.julia/packages/CUDA/1kIOw/src/array.jl:777 [inlined]
 [17] macro expansion
    @ ~/.julia/packages/CUDA/1kIOw/src/utilities.jl:35 [inlined]
 [18] macro expansion
    @ ~/.julia/packages/CUDA/1kIOw/src/memory.jl:831 [inlined]
 [19] top-level scope
    @ ./REPL[114]:1
Some type information was truncated. Use `show(err)` to see complete types.

EDIT: I believe this was fixed a couple of days ago, I'll wait for the next release and re-run my code.

@maleadt
Copy link
Member Author

maleadt commented Mar 3, 2025

As you noted, those issues are unrelated, and are fixed on the master branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants