Skip to content
This repository was archived by the owner on Mar 12, 2021. It is now read-only.

Commit fec501a

Browse files
authored
review
1 parent 2d0c06a commit fec501a

File tree

1 file changed

+2
-3
lines changed

1 file changed

+2
-3
lines changed

src/mapreduce.jl

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@ function partial_mapreduce_grid(f, op, A, R, neutral, Rreduce, Rother, gridDim_r
110110

111111
val = op(neutral, neutral)
112112

113-
# get a value that should be reduced
113+
# reduce serially across chunks of input vector that don't fit in a block
114114
ireduce = threadIdx_reduce + (blockIdx_reduce - 1) * blockDim_reduce
115115
while ireduce <= length(Rreduce)
116116
Ireduce = Rreduce[ireduce]
@@ -142,8 +142,7 @@ NVTX.@range function GPUArrays.mapreducedim!(f, op, R::CuArray{T}, A::AbstractAr
142142
# be conservative about using shuffle instructions
143143
shuffle = true
144144
shuffle &= capability(device()) >= v"3.0"
145-
shuffle &= T in (Int32, Int64, Float32, Float64, ComplexF32, ComplexF64)
146-
# TODO: add support for Bool (CUDAnative.jl#420)
145+
shuffle &= T in (Bool, Int32, Int64, Float32, Float64, ComplexF32, ComplexF64)
147146

148147
# iteration domain, split in two: one part covers the dimensions that should
149148
# be reduced, and the other covers the rest. combining both covers all values.

0 commit comments

Comments
 (0)