Skip to content

Commit ab6a58d

Browse files
committed
upd
1 parent 2e00301 commit ab6a58d

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

_posts/2025-03-10-sampling.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,7 @@ Implementation side, the 2. and 3. parts are orchestrated for better parallelism
106106
2. If not, we add $\texttt{a\_local}$ to $\texttt{a}$ and move on to the next block.
107107
3. Once we know the correct block, we perform a prefix sum over its tokens to pinpoint the exact token index.
108108

109-
The per-block partial sum rrand prefix sums are computed leveraging CUB collective primitives like `BlockReduce` and `BlockScan` to maximize efficiency.
109+
The per-block partial sum and prefix sums are computed leveraging [CUB collective primitives](https://docs.nvidia.com/cuda/cub/index.html) (now part of [CCCL](https://github.com/NVIDIA/cccl)) like [BlockReduce](https://nvidia.github.io/cccl/cub/api/classcub_1_1BlockReduce.html#_CPPv4I0_i_20BlockReduceAlgorithm_i_iEN3cub11BlockReduceE) and [BlockScan](https://nvidia.github.io/cccl/cub/api/classcub_1_1BlockScan.html#_CPPv4I0_i_18BlockScanAlgorithm_i_iEN3cub9BlockScanE) to maximize efficiency.
110110

111111
### Rejection Sampling
112112

0 commit comments

Comments
 (0)