Skip to content

Conversation

@bernhardmgruber
Copy link
Contributor

@bernhardmgruber bernhardmgruber commented Nov 28, 2025

WIP

cub.bench.scan.exclusive.sum.base on B200:

## [4] NVIDIA B200

|  T{ct}  |  OffsetT{ct}  |  Elements{io}  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |        Diff |      %Diff |  Status  |
|---------|---------------|----------------|------------|-------------|------------|-------------|-------------|------------|----------|
|   I8    |      I32      |     72576      |   9.773 us |       9.58% |  10.913 us |       6.25% |    1.140 us |     11.67% |   SLOW   |
|   I8    |      I32      |    1056384     |  11.379 us |       3.45% |  13.224 us |       4.89% |    1.846 us |     16.22% |   SLOW   |
|   I8    |      I32      |    16781184    |  38.191 us |       2.09% |  31.356 us |       2.86% |   -6.836 us |    -17.90% |   FAST   |
|   I8    |      I32      |   268442496    | 474.799 us |       0.58% | 269.826 us |       0.32% | -204.973 us |    -43.17% |   FAST   |
|   I8    |      I32      |   1073745792   |   1.875 ms |       0.31% |   1.047 ms |       0.09% | -827.709 us |    -44.15% |   FAST   |
|   I8    |      I64      |     72576      |   9.244 us |       1.43% |  11.317 us |       0.60% |    2.073 us |     22.43% |   SLOW   |
|   I8    |      I64      |    1056384     |  11.371 us |       5.03% |  13.240 us |       3.75% |    1.869 us |     16.44% |   SLOW   |
|   I8    |      I64      |    16781184    |  37.873 us |       3.25% |  31.085 us |       3.35% |   -6.789 us |    -17.92% |   FAST   |
|   I8    |      I64      |   268442496    | 467.858 us |       0.98% | 269.914 us |       0.35% | -197.944 us |    -42.31% |   FAST   |
|   I8    |      I64      |   1073745792   |   1.844 ms |       0.51% |   1.047 ms |       0.10% | -796.871 us |    -43.22% |   FAST   |
|   I8    |      I64      |   4294975104   |   9.164 us |       3.36% |   4.157 ms |       0.03% |    4.148 ms |  45260.45% |   SLOW   |
|   I16   |      I32      |     72576      |   9.494 us |       6.54% |  10.592 us |       9.04% |    1.098 us |     11.57% |   SLOW   |
|   I16   |      I32      |    1056384     |  11.342 us |       2.55% |  14.790 us |       6.61% |    3.447 us |     30.39% |   SLOW   |
|   I16   |      I32      |    16781184    |  38.848 us |       3.12% |  35.034 us |       4.87% |   -3.814 us |     -9.82% |   FAST   |
|   I16   |      I32      |   268442496    | 473.811 us |       1.07% | 297.185 us |       0.61% | -176.627 us |    -37.28% |   FAST   |
|   I16   |      I32      |   1073745792   |   1.868 ms |       0.51% |   1.134 ms |       0.14% | -733.934 us |    -39.28% |   FAST   |
|   I16   |      I64      |     72576      |   9.236 us |       2.18% |  10.746 us |       8.32% |    1.510 us |     16.35% |   SLOW   |
|   I16   |      I64      |    1056384     |  15.152 us |       5.82% |  15.536 us |       6.00% |    0.384 us |      2.54% |   SAME   |
|   I16   |      I64      |    16781184    |  41.622 us |       3.14% |  42.883 us |       2.92% |    1.262 us |      3.03% |   SLOW   |
|   I16   |      I64      |   268442496    | 492.626 us |       0.80% | 495.246 us |       0.78% |    2.619 us |      0.53% |   SAME   |
|   I16   |      I64      |   1073745792   |   1.936 ms |       0.37% |   1.935 ms |       0.37% |   -0.888 us |     -0.05% |   SAME   |
|   I16   |      I64      |   4294975104   |   9.170 us |       1.16% |   7.716 ms |       0.20% |    7.707 ms |  84049.47% |   SLOW   |
|   I32   |      I32      |     72576      |  10.913 us |       7.31% |  11.229 us |       2.33% |    0.316 us |      2.90% |   SLOW   |
|   I32   |      I32      |    1056384     |  11.954 us |       8.11% |  14.491 us |       7.97% |    2.537 us |     21.22% |   SLOW   |
|   I32   |      I32      |    16781184    |  43.783 us |       4.42% |  36.619 us |       3.40% |   -7.164 us |    -16.36% |   FAST   |
|   I32   |      I32      |   268442496    | 545.111 us |       1.59% | 318.979 us |       0.53% | -226.133 us |    -41.48% |   FAST   |
|   I32   |      I32      |   1073745792   |   2.139 ms |       0.77% |   1.228 ms |       0.41% | -911.623 us |    -42.61% |   FAST   |
|   I32   |      I64      |     72576      |   9.612 us |       7.77% |  11.204 us |       6.07% |    1.592 us |     16.56% |   SLOW   |
|   I32   |      I64      |    1056384     |  12.740 us |       7.42% |  14.981 us |       6.69% |    2.241 us |     17.59% |   SLOW   |
|   I32   |      I64      |    16781184    |  43.804 us |       4.05% |  36.602 us |       3.58% |   -7.202 us |    -16.44% |   FAST   |
|   I32   |      I64      |   268442496    | 545.338 us |       1.55% | 319.094 us |       0.49% | -226.244 us |    -41.49% |   FAST   |
|   I32   |      I64      |   1073745792   |   2.147 ms |       0.81% |   1.228 ms |       0.42% | -918.556 us |    -42.79% |   FAST   |
|   I32   |      I64      |   4294975104   |   9.160 us |       1.02% |   4.918 ms |       0.67% |    4.909 ms |  53593.72% |   SLOW   |
|   I64   |      I32      |     72576      |  11.297 us |       1.06% |  12.638 us |       7.65% |    1.341 us |     11.87% |   SLOW   |
|   I64   |      I32      |    1056384     |  13.370 us |       2.10% |  15.335 us |       3.58% |    1.966 us |     14.70% |   SLOW   |
|   I64   |      I32      |    16781184    |  68.362 us |       2.36% |  64.131 us |       7.05% |   -4.232 us |     -6.19% |   FAST   |
|   I64   |      I32      |   268442496    | 902.997 us |       0.57% | 800.659 us |       2.13% | -102.338 us |    -11.33% |   FAST   |
|   I64   |      I32      |   1073745792   |   3.581 ms |       0.31% |   3.149 ms |       0.99% | -432.325 us |    -12.07% |   FAST   |
|   I64   |      I64      |     72576      |  10.059 us |       9.98% |  11.392 us |       4.45% |    1.333 us |     13.25% |   SLOW   |
|   I64   |      I64      |    1056384     |  13.472 us |       5.16% |  15.292 us |       3.70% |    1.821 us |     13.51% |   SLOW   |
|   I64   |      I64      |    16781184    |  67.528 us |       2.69% |  63.810 us |       6.10% |   -3.718 us |     -5.51% |   FAST   |
|   I64   |      I64      |   268442496    | 910.227 us |       0.66% | 801.402 us |       2.09% | -108.824 us |    -11.96% |   FAST   |
|   I64   |      I64      |   1073745792   |   3.608 ms |       0.34% |   3.149 ms |       1.05% | -458.895 us |    -12.72% |   FAST   |
|   I64   |      I64      |   4294975104   |   9.207 us |       0.87% |  12.554 ms |       0.51% |   12.545 ms | 136260.72% |   SLOW   |
|  I128   |      I32      |     72576      |  13.342 us |       1.81% |  13.665 us |       5.61% |    0.323 us |      2.42% |   SLOW   |
|  I128   |      I32      |    1056384     |  25.970 us |       3.47% |  28.582 us |       3.80% |    2.612 us |     10.06% |   SLOW   |
|  I128   |      I32      |    16781184    | 213.839 us |       0.62% | 213.656 us |       0.63% |   -0.184 us |     -0.09% |   SAME   |
|  I128   |      I32      |   268442496    |   3.189 ms |       0.14% |   3.193 ms |       0.14% |    3.284 us |      0.10% |   SAME   |
|  I128   |      I32      |   1073745792   |  12.726 ms |       0.07% |  12.726 ms |       0.07% |   -0.059 us |     -0.00% |   SAME   |
|  I128   |      I64      |     72576      |  13.537 us |       4.44% |  14.471 us |       7.02% |    0.934 us |      6.90% |   SLOW   |
|  I128   |      I64      |    1056384     |  25.862 us |       2.87% |  29.387 us |       3.68% |    3.525 us |     13.63% |   SLOW   |
|  I128   |      I64      |    16781184    | 215.459 us |       0.59% | 215.206 us |       0.63% |   -0.253 us |     -0.12% |   SAME   |
|  I128   |      I64      |   268442496    |   3.215 ms |       0.14% |   3.218 ms |       0.14% |    2.091 us |      0.07% |   SAME   |
|  I128   |      I64      |   1073745792   |  12.824 ms |       0.07% |  12.826 ms |       0.07% |    2.265 us |      0.02% |   SAME   |
|  I128   |      I64      |   4294975104   |  11.380 us |       5.31% |  51.255 ms |       0.04% |   51.244 ms | 450311.83% |   SLOW   |
|   F32   |      I32      |     72576      |  10.230 us |       9.90% |  11.530 us |       5.89% |    1.300 us |     12.71% |   SLOW   |
|   F32   |      I32      |    1056384     |  11.834 us |       7.43% |  14.250 us |       7.23% |    2.416 us |     20.41% |   SLOW   |
|   F32   |      I32      |    16781184    |  42.262 us |       3.87% |  40.673 us |       3.47% |   -1.589 us |     -3.76% |   FAST   |
|   F32   |      I32      |   268442496    | 545.043 us |       1.51% | 349.034 us |       0.44% | -196.009 us |    -35.96% |   FAST   |
|   F32   |      I32      |   1073745792   |   2.154 ms |       0.72% |   1.335 ms |       0.13% | -818.394 us |    -38.00% |   FAST   |
|   F32   |      I64      |     72576      |  10.116 us |       9.78% |  11.324 us |       0.55% |    1.208 us |     11.94% |   SLOW   |
|   F32   |      I64      |    1056384     |  11.802 us |       8.00% |  14.787 us |       7.01% |    2.985 us |     25.29% |   SLOW   |
|   F32   |      I64      |    16781184    |  43.856 us |       4.39% |  40.624 us |       3.19% |   -3.232 us |     -7.37% |   FAST   |
|   F32   |      I64      |   268442496    | 545.610 us |       1.54% | 348.663 us |       0.47% | -196.947 us |    -36.10% |   FAST   |
|   F32   |      I64      |   1073745792   |   2.155 ms |       0.77% |   1.335 ms |       0.15% | -819.335 us |    -38.03% |   FAST   |
|   F32   |      I64      |   4294975104   |   9.166 us |       1.42% |   5.285 ms |       0.05% |    5.276 ms |  57556.92% |   SLOW   |
|   F64   |      I32      |     72576      |  11.296 us |       0.64% |  12.127 us |       8.36% |    0.831 us |      7.35% |   SLOW   |
|   F64   |      I32      |    1056384     |  14.302 us |       7.16% |  15.707 us |       5.40% |    1.405 us |      9.82% |   SLOW   |
|   F64   |      I32      |    16781184    |  68.580 us |       2.06% |  66.780 us |       7.10% |   -1.800 us |     -2.62% |   FAST   |
|   F64   |      I32      |   268442496    | 919.827 us |       0.67% | 886.553 us |       2.98% |  -33.274 us |     -3.62% |   FAST   |
|   F64   |      I32      |   1073745792   |   3.633 ms |       0.32% |   3.526 ms |       1.42% | -106.914 us |     -2.94% |   FAST   |
|   F64   |      I64      |     72576      |  10.522 us |       9.59% |  10.818 us |       7.73% |    0.296 us |      2.81% |   SAME   |
|   F64   |      I64      |    1056384     |  15.418 us |       2.59% |  16.327 us |       6.31% |    0.909 us |      5.89% |   SLOW   |
|   F64   |      I64      |    16781184    |  71.646 us |       1.68% |  71.886 us |       1.75% |    0.241 us |      0.34% |   SAME   |
|   F64   |      I64      |   268442496    | 964.252 us |       0.31% | 964.371 us |       0.33% |    0.118 us |      0.01% |   SAME   |
|   F64   |      I64      |   1073745792   |   3.822 ms |       0.16% |   3.822 ms |       0.16% |   -0.300 us |     -0.01% |   SAME   |
|   F64   |      I64      |   4294975104   |   9.161 us |       1.63% |  15.256 ms |       0.08% |   15.247 ms | 166425.68% |   SLOW   |
|   C32   |      I32      |     72576      |  13.492 us |       4.55% |  13.981 us |       7.88% |    0.490 us |      3.63% |   SAME   |
|   C32   |      I32      |    1056384     |  25.215 us |       3.81% |  25.473 us |       3.04% |    0.258 us |      1.02% |   SAME   |
|   C32   |      I32      |    16781184    | 203.001 us |       1.15% | 203.035 us |       1.04% |    0.034 us |      0.02% |   SAME   |
|   C32   |      I32      |   268442496    |   3.086 ms |       0.26% |   3.068 ms |       0.26% |  -18.705 us |     -0.61% |   FAST   |
|   C32   |      I32      |   1073745792   |  12.282 ms |       0.13% |  12.268 ms |       0.12% |  -14.091 us |     -0.11% |   SAME   |
|   C32   |      I64      |     72576      |  14.572 us |       6.96% |  15.502 us |       4.26% |    0.930 us |      6.38% |   SLOW   |
|   C32   |      I64      |    1056384     |  25.819 us |       3.19% |  25.333 us |       4.06% |   -0.486 us |     -1.88% |   SAME   |
|   C32   |      I64      |    16781184    | 203.783 us |       1.15% | 204.356 us |       1.13% |    0.574 us |      0.28% |   SAME   |
|   C32   |      I64      |   268442496    |   3.088 ms |       0.27% |   3.096 ms |       0.26% |    8.165 us |      0.26% |   SLOW   |
|   C32   |      I64      |   1073745792   |  12.323 ms |       0.14% |  12.323 ms |       0.13% |    0.379 us |      0.00% |   SAME   |
|   C32   |      I64      |   4294975104   |   9.225 us |       2.16% |  49.232 ms |       0.06% |   49.222 ms | 533570.16% |   SLOW   |

Fixes: #6644

@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Nov 28, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Progress in CCCL Nov 28, 2025
@bernhardmgruber
Copy link
Contributor Author

/ok to test 621720f

@github-actions

This comment has been minimized.

@bernhardmgruber
Copy link
Contributor Author

/ok to test 96a492f

@github-actions

This comment has been minimized.

// For 64-bit types, we still use __shfl_sync
[[nodiscard]] _CCCL_DEVICE_API inline int makeWarpUniform(int x)
{
NV_IF_ELSE_TARGET(NV_PROVIDES_SM_90, (return __reduce_min_sync(~0, x);), (return x;));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahendriksen should this fall back to __shfl_sync for non SM90 ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we should actually use WarpReduce here, because that has an optimization for that

@miscco
Copy link
Contributor

miscco commented Nov 28, 2025

/ok to test

.set_name("base")
.set_type_axes_names({"T{ct}", "OffsetT{ct}"})
.add_int64_power_of_two_axis("Elements{io}", nvbench::range(16, 28, 4));
//.add_int64_power_of_two_axis("Elements{io}", nvbench::range(16, 28, 4))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical: We need to make sure we can handle partial tiles

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the changes working locally. Will upstream soon.

: SquadDesc(squadStatic)
, mSpecialRegisters(specialRegisters)
{
mIsWarpLeader = ::cuda::ptx::elect_sync(~0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make this available in earlier architectures

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can do this using mIsWarpLeader = (threadIdx.x % 32) == 0;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or sr.laneIdx == 0

squadDispatch(SpecialRegisters sr, const SquadDesc (&squads)[numSquads], F f, int warpIdxStart = 0)
{
static_assert(numSquads > 0);
if (numSquads == 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be

Suggested change
if (numSquads == 1)
if constexpr (numSquads == 1)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it can. Not sure if there is any benefit, but it is possible.

}
if (sr.warpIdx < warpIdxStartMid)
{
if constexpr (0 < mid)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it would be clearer to compare against 0

Suggested change
if constexpr (0 < mid)
if constexpr (mid != 0)

Comment on lines 465 to 471
template <int numLookbackTiles,
int tile_size,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: Use CamelCase

Comment on lines 370 to 404
NV_IF_ELSE_TARGET(
NV_IS_HOST,
({
int curr_device{};
if (const auto error = CubDebug(cudaGetDevice(&curr_device)))
{
return error;
}

int max_smem_size_optin{};
if (const auto error = CubDebug(
cudaDeviceGetAttribute(&max_smem_size_optin, cudaDevAttrMaxSharedMemoryPerBlockOptin, curr_device)))
{
return error;
}

int reserved_smem_size{};
if (const auto error = CubDebug(
cudaDeviceGetAttribute(&reserved_smem_size, cudaDevAttrReservedSharedMemoryPerBlock, curr_device)))
{
return error;
}
max_dynamic_smem_size = max_smem_size_optin - reserved_smem_size;
}),
({
cudaFuncAttributes func_attrs{};
if (const auto error = CubDebug(cudaFuncGetAttributes(&func_attrs, func)))
{
return error;
}
max_dynamic_smem_size = func_attrs.maxDynamicSharedSizeBytes;
}))
return cudaSuccess;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: I believe we should move this into a utility function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davebayer did this in #6818

Comment on lines +409 to +389
auto* d_in_unwrapped = THRUST_NS_QUALIFIER::unwrap_contiguous_iterator(d_in);
auto* d_out_unwrapped = THRUST_NS_QUALIFIER::unwrap_contiguous_iterator(d_out);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self, no change requested, we should really move this to to_address

REQUIRE(all_results_correct == true);

// Copy over the results and expected results to host and compare
#if false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Should this be enabled

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just a debug print utility in case of failing tests. I'm leaning towards dropping this.

@github-actions

This comment has been minimized.

@miscco
Copy link
Contributor

miscco commented Nov 29, 2025

/ok to test

4 similar comments
@miscco
Copy link
Contributor

miscco commented Nov 29, 2025

/ok to test

@miscco
Copy link
Contributor

miscco commented Nov 29, 2025

/ok to test

@miscco
Copy link
Contributor

miscco commented Nov 29, 2025

/ok to test

@miscco
Copy link
Contributor

miscco commented Nov 29, 2025

/ok to test

Comment on lines +311 to +430
int warpIsPrivSum = 0;
NV_IF_TARGET(NV_PROVIDES_SM_80, (warpIsPrivSum = __reduce_or_sync(~0, laneIsPrivSum);))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahendriksen this is unused, did we accidentally drop something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some code is/was left behind to support decoupled lookback, which has cumSum states in tmp_states in addition to just privSum. See the commented out lines starting with // We are not storing CUM_SUM states, because it makes updating idxTileCur below.

Since we are fairly confident that we will only need the privSum states, we can drop warpIsCumSum and I think we can also drop warpIsPrivSum (as we are using warpIsEmpty below which gives all necessary information).

@github-actions

This comment has been minimized.

// For 64-bit types, we still use __shfl_sync
[[nodiscard]] _CCCL_DEVICE_API inline int makeWarpUniform(int x)
{
NV_IF_ELSE_TARGET(NV_PROVIDES_SM_90, (return __reduce_min_sync(~0, x);), (return x;));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we should actually use WarpReduce here, because that has an optimization for that

Comment on lines +307 to +428
int warpIsEmpty = 0;
NV_IF_TARGET(NV_PROVIDES_SM_80, (warpIsEmpty = __reduce_or_sync(~0, laneIsEmpty);))
int warpIsCumSum = 0;
NV_IF_TARGET(NV_PROVIDES_SM_80, (warpIsCumSum = __reduce_or_sync(~0, laneIsCumSum);))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important: This is technically UB, because the bitwise reduce functions take an unsigned input

Suggested change
int warpIsEmpty = 0;
NV_IF_TARGET(NV_PROVIDES_SM_80, (warpIsEmpty = __reduce_or_sync(~0, laneIsEmpty);))
int warpIsCumSum = 0;
NV_IF_TARGET(NV_PROVIDES_SM_80, (warpIsCumSum = __reduce_or_sync(~0, laneIsCumSum);))
unsigned warpIsEmpty = 0;
NV_IF_TARGET(NV_PROVIDES_SM_80, (warpIsEmpty = __reduce_or_sync(~0, laneIsEmpty);))
unsigned warpIsCumSum = 0;
NV_IF_TARGET(NV_PROVIDES_SM_80, (warpIsCumSum = __reduce_or_sync(~0, laneIsCumSum);))

Comment on lines 379 to 514
_CCCL_GLOBAL_CONSTANT SquadDesc squadReduce{/*squadIdx=*/0, /*numWarps=*/4};
_CCCL_GLOBAL_CONSTANT SquadDesc squadScanStore{/*squadIdx=*/1, /*numWarps=*/4};
_CCCL_GLOBAL_CONSTANT SquadDesc squadLoad{/*squadIdx=*/2, /*numWarps=*/1};
_CCCL_GLOBAL_CONSTANT SquadDesc squadSched{/*squadIdx=*/3, /*numWarps=*/1};
_CCCL_GLOBAL_CONSTANT SquadDesc squadLookback{/*squadIdx=*/4, /*numWarps=*/1};

_CCCL_GLOBAL_CONSTANT SquadDesc scanSquads[] = {
squadReduce,
squadScanStore,
squadLoad,
squadSched,
squadLookback,
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we should have a make_squads(int...) that returns effectively scanSquads

We then should be able to name the individual array members via a reference

const uint32_t laneIdx;
};

[[nodiscard]] _CCCL_DEVICE_API inline SpecialRegisters getSpecialRegisters()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I actually meant cudax, so that we can have something that can evolve

@bernhardmgruber
Copy link
Contributor Author

/ok to test 7c44978

@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Dec 1, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions

This comment has been minimized.

@bernhardmgruber
Copy link
Contributor Author

/ok to test 2ec0602

@github-actions
Copy link
Contributor

github-actions bot commented Dec 2, 2025

😬 CI Workflow Results

🟥 Finished in 2h 46m: Pass: 54%/267 | Total: 5d 07h | Max: 2h 21m | Hits: 73%/210590

See results here.

@bernhardmgruber bernhardmgruber changed the title Integrate warpspeed scan Integrate decoupled lookahead warpspeed scan Dec 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

Decoupled lookahead scan MVP

5 participants