Update range of gpu arch #23309

yf711 · 2025-01-09T21:08:02Z

Description

Remove deprecated gpu arch to control nuget/python package size (latest TRT supports sm75 Turing and newer arch)
Add 90 to support blackwell series in next release (86;89 not considered as adding them will rapidly increase package size)

arch_range	Python-cuda12	Nuget-cuda12
60;61;70;75;80	Linux: 279MB Win: 267MB	Linux: 247MB Win: 235MB
75;80	Linux: 174MB Win: 162MB	Linux: 168MB Win: 156MB
75;80;90	Linux: 299MB Win: 277MB	Linux: 294MB Win: 271MB
75;80;86;89	Linux: MB Win: 390MB	Linux: 416MB Win: 383MB
75;80;86;89;90	Linux: MB Win: 505MB	Linux: 541MB Win: 498MB

Motivation and Context

Callout: While adding sm90 support, the build of cuda11.8+cudnn8 will be dropped in the coming ORT release,
as the build has issue with blackwell (mentioned in comments) and demand on cuda 11 is minor, according to internal ort-cuda11 repo.

tianleiwu · 2025-01-09T21:41:34Z

If we drop older arch, shall we also drop ort package for cuda 11.8 in next release?

snnn · 2025-01-09T21:50:42Z

If we drop older arch, shall we also drop ort package for cuda 11.8 in next release?

I highly recommend doing so. Now we only have two people working on build pipelines. We should focus more on the main targets.

tools/ci_build/github/linux/build_cuda_c_api_package.sh

snnn · 2025-01-10T21:20:33Z

/azp run Win_TRT_Minimal_CUDA_Test_CI

azure-pipelines · 2025-01-10T21:20:45Z

Azure Pipelines successfully started running 1 pipeline(s).

yf711 · 2025-01-11T00:11:03Z

After testing, adding sm90 to build arch list is causing issues to cuda 11.8+cudnn8 alt pkg build on windows,
which is likely because cudnn8 is deprecated by blackwell. cuda 12 pkg build is not affected.

To support sm90, we can choose to support cuda12 only, or we might need to update current cuda 11.8 env with cudnn9

snnn · 2025-01-11T00:39:54Z

CUDA 11.8 with cudnn9 doesn't work. I tried.

I hit the following compilation error when compiling cudnn_flash_attention.cu

/build/Release/_deps/cudnn_frontend-src/include/cudnn_frontend/graph_interface.h:519:27:   required from here
/build/Release/_deps/cudnn_frontend-src/include/cudnn_frontend/thirdparty/nlohmann/json.hpp:9132:68: error: static assertion failed: Missing/invalid function: bool boolean(bool)
 9132 |     static_assert(is_detected_exact<bool, boolean_function_t, SAX>::value,

Therefore, I suggest giving up on that.

tianleiwu · 2025-01-24T00:13:55Z

BTW, Blackwell GPUs have compute-capabilities 10.0 (B100 and B200) and 12.0 (B40 and RTX 5090 etc). To support RTX 5090 etc, we can add 120 to CMAKE_CUDA_ARCHITECTURES, and upgrade cuda to 12.8 for cuda EP.

jywu-msft · 2025-01-24T00:19:06Z

BTW, Blackwell GPUs have compute-capabilities 10.0 (B100 and B200) and 12.0 (B40 and RTX 5090 etc). To support RTX 5090 etc, we can add 120 to CMAKE_CUDA_ARCHITECTURES, and upgrade cuda to 12.8 for cuda EP.

CUDA 12.8 would require a new driver right? not sure how easy that would be to update soon? @snnn ?
And if we update CUDA , we would need to update TensorRT as well for Blackwell support.

gedoensmax · 2025-01-24T09:45:41Z

cuDNN (9.7), TensorRT (10.8) and CUDA (12.8) are the first versions to support Blackwell. All of these went live this week and are now publicly available. Driver requirements are the same as with any other CUDA 12 release due to minor version compatibility which is a feature since CUDA 11.

I would say targets older than Turing (75) can certainly be dropped, or we support this by e.g. shipping PTX for an old architecture, but not sure how long JIT compilation in the driver will take for all the ORT kernels. Besides that would it make sense to differentiate between windows and linux ? Are there known sm80 (A100) and sm90 (H100) customers on Windows ? Otherwise ORT could trim the Windows package to the consumer archs (75,86,89.120 and maybe PTX for 120 as forward compatibility).
More details on blackwell+CUDA are in the migration docs

snnn · 2025-01-24T18:27:09Z

@jywu-msft , how about we do it for Linux first? 80% of the onnxruntime-gpu downloads are from Linux.

I have a draft change that moves Linux GPU tests from A10 to A100(which uses official CUDA driver). https://github.com/microsoft/onnxruntime/tree/snnn/replace_pool . But it hit some tests failures. I just asked @tianleiwu for help.

Then, we may upgrade all our pipelines to CUDA 12.4(not 12.8 yet) . Our Windows machines use Nvidia driver 550, which should be good for CUDA 12.4. We also have a cudnn frontend issue that needs be addressed when upgrading cudnn: #23244 (comment)

Then, upgrade Visual Studio to the latest

Then upgrade cudnn frontend to the latest, which needs the latest Visual Studio.

Then, continue upgrade to CUDA 12.8 for Linux build.

yf711 · 2025-01-30T20:04:01Z

Hi @gedoensmax do you have any suggestion reducing the ort package size?

arch_range	Python-cuda12	Nuget-cuda12
75;80	Linux: 174MB Win: 162MB	Linux: 168MB Win: 156MB
75;80;90	Linux: 299MB Win: 277MB	Linux: 294MB Win: 271MB
75;80;86;89	Linux: MB Win: 390MB	Linux: 416MB Win: 383MB
75;80;86;89;90	Linux: MB Win: 505MB	Linux: 541MB Win: 498MB

Comparing 1st & 2nd row, simply adding sm90 increases package size by 70%, and we are hitting the size limit set by nuget.io

gedoensmax · 2025-01-30T20:35:31Z

@yf711 CUDA binary size is a general problem that is not easily solved. There is a reason that cuDNN now generates/compiles kernels dynamically. This allows for optimizations that you usually only get at compile time and in addition reduces binary size.

Training kernels are not included in these packages ? Did you conduct any analysis into which kernel takes how much space ? There is nothing that immediately comes to mind to easily reduce binary size. I can take a closer look tomorrow or early next week if needed.

gedoensmax · 2025-01-31T11:55:14Z

Quick update after reading up a little more.
You can expect SASS code from 8.6 to run on 8.9. See this doc. That even goes for 80, it will run on 86 and 89 but since 86 and 89 have higher fp32 throughput you will not be able to utilize this functionality.
Due to that 75;80;90 will work on all architectures after Turing. Does 90 or another include PTX ? Without blackwell datacenter nor consumer should work.

### Description  Action item: * ~~Add LTO support when cuda 12.8 & Relocatable Device Code (RDC)/separate_compilation are enabled, to reduce potential perf regression~~LTO needs further testing * Reduce nuget/whl package size by selecting devices & their cuda binary/PTX assembly during ORT build; * make sure ORT nuget package < 250 MB, python wheel < 300 MB * Suggest creating internal repo to publish pre-built package with Blackwell sm100/120 SASS and sm120 PTX to repo like [onnxruntime-blackwell](https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/onnxruntime-blackwell), since the package size will be much larger than nuget/pypi repo limit * Considering the most popular datacenter/consumer GPUs, here's the cuda_arch list for linux/windows: * With this change, perf on next release ORT is optimal on Linux with Tesla P100 (sm60), V100 (sm70), T4 (sm75), A100 (sm80), A10 (sm86, py whl), H100 (sm90); on Windows with GTX 980 (sm52), GTX 1080 (sm61), RTX 2080 (sm75), RTX 3090 (sm86), RTX 4090 (sm89). Other newer architecture GPUs are compatible. * | OS | cmake_cuda_architecture | package size | | ------------- | ------------------------------------------ | ------------ | | Linux nupkg | 60-real;70-real;75-real;80-real;90 | 215 MB | | Linux whl | 60-real;70-real;75-real;80-real;86-real;90 | 268 MB | | Windows nupkg | 52-real;61-real;75-real;86-real;89-real;90-virtual | 197 MB | | Windows whl | 52-real;61-real;75-real;86-real;89-real;90-virtual | 204 MB | * [TODO] Vaildate on Windows CUDA CI pipeline with cu128 ### Motivation and Context  Address discussed topics in #23562 and #23309 #### Stats | libonnxruntime_providers_cuda lib size | Main 75;80;90 | 75-real;80-real;90-virtual | 75-real;80;90-virtual | 75-real;80-real;86-virtual;89-virtual;90-virtual | 75-real;86-real;89 | 75-real;80;90 | 75-real;80-real;90 | 61-real;75-real;86-real;89 | | -------------------------------------- | ----------------- | -------------------------- | --------------------- | ------------------------------------------------ | ------------------ | ------------- | ------------------ | -------------------------- | | Linux | 446 MB | 241 MB | 362 MB | 482 MB | N/A | 422 MB | 301 MB | | | Windows | 417 MB | 224 MB | 338 MB | 450 MB | 279 MB | N/A | | 292 MB | | nupkg size | Main 75;80;90 | 75-real;80-real;90-virtual | 75-real;80;90-virtual | 75-real;80-real;86-virtual;89-virtual;90-virtual | 75-real;86-real;89 | 75-real;80;90 | 75-real;80-real;90 | 61-real;75-real;86-real;89 | | ---------- | ----------------- | -------------------------- | --------------------- | ------------------------------------------------ | ------------------ | ------------- | ------------------ | -------------------------- | | Linux | 287 MB | TBD | 224 MB | 299 MB | | | 197 MB | N/A | | Windows | 264 MB | TBD | 205 MB | 274 MB | | | N/A | 188 MB | | whl size | Main 75;80;90 | 75-real;80-real;90-virtual | 75-real;80;90-virtual | 75-real;80-real;86-virtual;89-virtual;90-virtual | 75-real;86-real;89 | 75-real;80;90 | 75-real;80-real;90 | 61-real;75-real;86-real;89 | | -------- | ----------------- | -------------------------- | --------------------- | ------------------------------------------------ | ------------------ | ------------- | ------------------ | -------------------------- | | Linux | 294 MB | 154 MB | TBD | TBD | N/A | 278 MB | 203 MB | N/A | | Windows | 271 MB | 142 MB | TBD | 280 MB | 184 MB | N/A | N/A | 194 MB | ### Reference https://developer.nvidia.com/cuda-gpus [Improving GPU Application Performance with NVIDIA CUDA 11.2 Device Link Time Optimization](https://developer.nvidia.com/blog/improving-gpu-app-performance-with-cuda-11-2-device-lto/) [PTX Compatibility](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#ptx-compatibility) [Application Compatibility on the NVIDIA Ada GPU Architecture](https://docs.nvidia.com/cuda/ada-compatibility-guide/#application-compatibility-on-the-nvidia-ada-gpu-architecture) [Software Migration Guide for NVIDIA Blackwell RTX GPUs: A Guide to CUDA 12.8, PyTorch, TensorRT, and Llama.cpp](https://forums.developer.nvidia.com/t/software-migration-guide-for-nvidia-blackwell-rtx-gpus-a-guide-to-cuda-12-8-pytorch-tensorrt-and-llama-cpp/321330) ### Track some failed/unfinished experiments to control package size: 1. Build ORT with `CUDNN_FRONTEND_SKIP_JSON_LIB=ON` doesn't help much on package size; 2. ORT packaging uses 7z to pack the package, which can only use zip's deflate compression. In such format, setting compression ratio to ultra `-mx=9` doesn't help much to control size (7z's LZMA compression is much better but not supported by nuget/pypi) 3. Simply replacing `sm_xx` with `lto_xx` would increase cudaep library size by ~50% (Haven't tested on perf yet). This needs further validation.

### Description  * Remove deprecated gpu arch to control nuget/python package size (latest TRT supports sm75 Turing and newer arch) * Add 90 to support blackwell series in next release (86;89 not considered as adding them will rapidly increase package size) | arch_range | Python-cuda12 | Nuget-cuda12 | | -------------- | ------------------------------------------------------------ | ---------------------------------- | | 60;61;70;75;80 | Linux: 279MB Win: 267MB | Linux: 247MB Win: 235MB | | 75;80 | Linux: 174MB Win: 162MB | Linux: 168MB Win: 156MB | | **75;80;90** | **Linux: 299MB Win: 277MB** | **Linux: 294MB Win: 271MB** | | 75;80;86;89 | [Linux: MB Win: 390MB](https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=647457&view=results) | Linux: 416MB Win: 383MB | | 75;80;86;89;90 | [Linux: MB Win: 505MB](https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=646536&view=results) | Linux: 541MB Win: 498MB | ### Motivation and Context  Callout: While adding sm90 support, the build of cuda11.8+cudnn8 will be dropped in the coming ORT release, as the build has issue with blackwell (mentioned in comments) and demand on cuda 11 is minor, according to internal ort-cuda11 repo.

### Description  Action item: * ~~Add LTO support when cuda 12.8 & Relocatable Device Code (RDC)/separate_compilation are enabled, to reduce potential perf regression~~LTO needs further testing * Reduce nuget/whl package size by selecting devices & their cuda binary/PTX assembly during ORT build; * make sure ORT nuget package < 250 MB, python wheel < 300 MB * Suggest creating internal repo to publish pre-built package with Blackwell sm100/120 SASS and sm120 PTX to repo like [onnxruntime-blackwell](https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/onnxruntime-blackwell), since the package size will be much larger than nuget/pypi repo limit * Considering the most popular datacenter/consumer GPUs, here's the cuda_arch list for linux/windows: * With this change, perf on next release ORT is optimal on Linux with Tesla P100 (sm60), V100 (sm70), T4 (sm75), A100 (sm80), A10 (sm86, py whl), H100 (sm90); on Windows with GTX 980 (sm52), GTX 1080 (sm61), RTX 2080 (sm75), RTX 3090 (sm86), RTX 4090 (sm89). Other newer architecture GPUs are compatible. * | OS | cmake_cuda_architecture | package size | | ------------- | ------------------------------------------ | ------------ | | Linux nupkg | 60-real;70-real;75-real;80-real;90 | 215 MB | | Linux whl | 60-real;70-real;75-real;80-real;86-real;90 | 268 MB | | Windows nupkg | 52-real;61-real;75-real;86-real;89-real;90-virtual | 197 MB | | Windows whl | 52-real;61-real;75-real;86-real;89-real;90-virtual | 204 MB | * [TODO] Vaildate on Windows CUDA CI pipeline with cu128 ### Motivation and Context  Address discussed topics in #23562 and #23309 #### Stats | libonnxruntime_providers_cuda lib size | Main 75;80;90 | 75-real;80-real;90-virtual | 75-real;80;90-virtual | 75-real;80-real;86-virtual;89-virtual;90-virtual | 75-real;86-real;89 | 75-real;80;90 | 75-real;80-real;90 | 61-real;75-real;86-real;89 | | -------------------------------------- | ----------------- | -------------------------- | --------------------- | ------------------------------------------------ | ------------------ | ------------- | ------------------ | -------------------------- | | Linux | 446 MB | 241 MB | 362 MB | 482 MB | N/A | 422 MB | 301 MB | | | Windows | 417 MB | 224 MB | 338 MB | 450 MB | 279 MB | N/A | | 292 MB | | nupkg size | Main 75;80;90 | 75-real;80-real;90-virtual | 75-real;80;90-virtual | 75-real;80-real;86-virtual;89-virtual;90-virtual | 75-real;86-real;89 | 75-real;80;90 | 75-real;80-real;90 | 61-real;75-real;86-real;89 | | ---------- | ----------------- | -------------------------- | --------------------- | ------------------------------------------------ | ------------------ | ------------- | ------------------ | -------------------------- | | Linux | 287 MB | TBD | 224 MB | 299 MB | | | 197 MB | N/A | | Windows | 264 MB | TBD | 205 MB | 274 MB | | | N/A | 188 MB | | whl size | Main 75;80;90 | 75-real;80-real;90-virtual | 75-real;80;90-virtual | 75-real;80-real;86-virtual;89-virtual;90-virtual | 75-real;86-real;89 | 75-real;80;90 | 75-real;80-real;90 | 61-real;75-real;86-real;89 | | -------- | ----------------- | -------------------------- | --------------------- | ------------------------------------------------ | ------------------ | ------------- | ------------------ | -------------------------- | | Linux | 294 MB | 154 MB | TBD | TBD | N/A | 278 MB | 203 MB | N/A | | Windows | 271 MB | 142 MB | TBD | 280 MB | 184 MB | N/A | N/A | 194 MB | ### Reference https://developer.nvidia.com/cuda-gpus [Improving GPU Application Performance with NVIDIA CUDA 11.2 Device Link Time Optimization](https://developer.nvidia.com/blog/improving-gpu-app-performance-with-cuda-11-2-device-lto/) [PTX Compatibility](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#ptx-compatibility) [Application Compatibility on the NVIDIA Ada GPU Architecture](https://docs.nvidia.com/cuda/ada-compatibility-guide/#application-compatibility-on-the-nvidia-ada-gpu-architecture) [Software Migration Guide for NVIDIA Blackwell RTX GPUs: A Guide to CUDA 12.8, PyTorch, TensorRT, and Llama.cpp](https://forums.developer.nvidia.com/t/software-migration-guide-for-nvidia-blackwell-rtx-gpus-a-guide-to-cuda-12-8-pytorch-tensorrt-and-llama-cpp/321330) ### Track some failed/unfinished experiments to control package size: 1. Build ORT with `CUDNN_FRONTEND_SKIP_JSON_LIB=ON` doesn't help much on package size; 2. ORT packaging uses 7z to pack the package, which can only use zip's deflate compression. In such format, setting compression ratio to ultra `-mx=9` doesn't help much to control size (7z's LZMA compression is much better but not supported by nuget/pypi) 3. Simply replacing `sm_xx` with `lto_xx` would increase cudaep library size by ~50% (Haven't tested on perf yet). This needs further validation.

### Description  * Remove deprecated gpu arch to control nuget/python package size (latest TRT supports sm75 Turing and newer arch) * Add 90 to support blackwell series in next release (86;89 not considered as adding them will rapidly increase package size) | arch_range | Python-cuda12 | Nuget-cuda12 | | -------------- | ------------------------------------------------------------ | ---------------------------------- | | 60;61;70;75;80 | Linux: 279MB Win: 267MB | Linux: 247MB Win: 235MB | | 75;80 | Linux: 174MB Win: 162MB | Linux: 168MB Win: 156MB | | **75;80;90** | **Linux: 299MB Win: 277MB** | **Linux: 294MB Win: 271MB** | | 75;80;86;89 | [Linux: MB Win: 390MB](https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=647457&view=results) | Linux: 416MB Win: 383MB | | 75;80;86;89;90 | [Linux: MB Win: 505MB](https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=646536&view=results) | Linux: 541MB Win: 498MB | ### Motivation and Context  Callout: While adding sm90 support, the build of cuda11.8+cudnn8 will be dropped in the coming ORT release, as the build has issue with blackwell (mentioned in comments) and demand on cuda 11 is minor, according to internal ort-cuda11 repo.

### Description  Action item: * ~~Add LTO support when cuda 12.8 & Relocatable Device Code (RDC)/separate_compilation are enabled, to reduce potential perf regression~~LTO needs further testing * Reduce nuget/whl package size by selecting devices & their cuda binary/PTX assembly during ORT build; * make sure ORT nuget package < 250 MB, python wheel < 300 MB * Suggest creating internal repo to publish pre-built package with Blackwell sm100/120 SASS and sm120 PTX to repo like [onnxruntime-blackwell](https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/onnxruntime-blackwell), since the package size will be much larger than nuget/pypi repo limit * Considering the most popular datacenter/consumer GPUs, here's the cuda_arch list for linux/windows: * With this change, perf on next release ORT is optimal on Linux with Tesla P100 (sm60), V100 (sm70), T4 (sm75), A100 (sm80), A10 (sm86, py whl), H100 (sm90); on Windows with GTX 980 (sm52), GTX 1080 (sm61), RTX 2080 (sm75), RTX 3090 (sm86), RTX 4090 (sm89). Other newer architecture GPUs are compatible. * | OS | cmake_cuda_architecture | package size | | ------------- | ------------------------------------------ | ------------ | | Linux nupkg | 60-real;70-real;75-real;80-real;90 | 215 MB | | Linux whl | 60-real;70-real;75-real;80-real;86-real;90 | 268 MB | | Windows nupkg | 52-real;61-real;75-real;86-real;89-real;90-virtual | 197 MB | | Windows whl | 52-real;61-real;75-real;86-real;89-real;90-virtual | 204 MB | * [TODO] Vaildate on Windows CUDA CI pipeline with cu128 ### Motivation and Context  Address discussed topics in #23562 and #23309 #### Stats | libonnxruntime_providers_cuda lib size | Main 75;80;90 | 75-real;80-real;90-virtual | 75-real;80;90-virtual | 75-real;80-real;86-virtual;89-virtual;90-virtual | 75-real;86-real;89 | 75-real;80;90 | 75-real;80-real;90 | 61-real;75-real;86-real;89 | | -------------------------------------- | ----------------- | -------------------------- | --------------------- | ------------------------------------------------ | ------------------ | ------------- | ------------------ | -------------------------- | | Linux | 446 MB | 241 MB | 362 MB | 482 MB | N/A | 422 MB | 301 MB | | | Windows | 417 MB | 224 MB | 338 MB | 450 MB | 279 MB | N/A | | 292 MB | | nupkg size | Main 75;80;90 | 75-real;80-real;90-virtual | 75-real;80;90-virtual | 75-real;80-real;86-virtual;89-virtual;90-virtual | 75-real;86-real;89 | 75-real;80;90 | 75-real;80-real;90 | 61-real;75-real;86-real;89 | | ---------- | ----------------- | -------------------------- | --------------------- | ------------------------------------------------ | ------------------ | ------------- | ------------------ | -------------------------- | | Linux | 287 MB | TBD | 224 MB | 299 MB | | | 197 MB | N/A | | Windows | 264 MB | TBD | 205 MB | 274 MB | | | N/A | 188 MB | | whl size | Main 75;80;90 | 75-real;80-real;90-virtual | 75-real;80;90-virtual | 75-real;80-real;86-virtual;89-virtual;90-virtual | 75-real;86-real;89 | 75-real;80;90 | 75-real;80-real;90 | 61-real;75-real;86-real;89 | | -------- | ----------------- | -------------------------- | --------------------- | ------------------------------------------------ | ------------------ | ------------- | ------------------ | -------------------------- | | Linux | 294 MB | 154 MB | TBD | TBD | N/A | 278 MB | 203 MB | N/A | | Windows | 271 MB | 142 MB | TBD | 280 MB | 184 MB | N/A | N/A | 194 MB | ### Reference https://developer.nvidia.com/cuda-gpus [Improving GPU Application Performance with NVIDIA CUDA 11.2 Device Link Time Optimization](https://developer.nvidia.com/blog/improving-gpu-app-performance-with-cuda-11-2-device-lto/) [PTX Compatibility](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#ptx-compatibility) [Application Compatibility on the NVIDIA Ada GPU Architecture](https://docs.nvidia.com/cuda/ada-compatibility-guide/#application-compatibility-on-the-nvidia-ada-gpu-architecture) [Software Migration Guide for NVIDIA Blackwell RTX GPUs: A Guide to CUDA 12.8, PyTorch, TensorRT, and Llama.cpp](https://forums.developer.nvidia.com/t/software-migration-guide-for-nvidia-blackwell-rtx-gpus-a-guide-to-cuda-12-8-pytorch-tensorrt-and-llama-cpp/321330) ### Track some failed/unfinished experiments to control package size: 1. Build ORT with `CUDNN_FRONTEND_SKIP_JSON_LIB=ON` doesn't help much on package size; 2. ORT packaging uses 7z to pack the package, which can only use zip's deflate compression. In such format, setting compression ratio to ultra `-mx=9` doesn't help much to control size (7z's LZMA compression is much better but not supported by nuget/pypi) 3. Simply replacing `sm_xx` with `lto_xx` would increase cudaep library size by ~50% (Haven't tested on perf yet). This needs further validation.

update range of gpu arch

30bee96

snnn previously approved these changes Jan 9, 2025

View reviewed changes

tianleiwu reviewed Jan 10, 2025

View reviewed changes

tools/ci_build/github/linux/build_cuda_c_api_package.sh Outdated Show resolved Hide resolved

yf711 requested a review from jywu-msft January 10, 2025 00:32

append latest arch

3718bab

yf711 dismissed snnn’s stale review via 3718bab January 10, 2025 00:50

snnn previously approved these changes Jan 10, 2025

View reviewed changes

revert

bec0f68

yf711 dismissed snnn’s stale review via bec0f68 January 10, 2025 21:33

Set 75;80;90 and limit pkg size

2b0f866

tianleiwu approved these changes Jan 14, 2025

View reviewed changes

snnn approved these changes Jan 14, 2025

View reviewed changes

yf711 merged commit 5c3c764 into main Jan 14, 2025
130 of 133 checks passed

yf711 deleted the yifanl/update_arch branch January 14, 2025 22:27

yf711 mentioned this pull request Feb 12, 2025

Update cmake_cuda_architecture to control package size #23671

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update range of gpu arch #23309

Update range of gpu arch #23309

yf711 commented Jan 9, 2025 •

edited

Loading

tianleiwu commented Jan 9, 2025

snnn commented Jan 9, 2025

snnn commented Jan 10, 2025

azure-pipelines bot commented Jan 10, 2025

yf711 commented Jan 11, 2025

snnn commented Jan 11, 2025

tianleiwu commented Jan 24, 2025

jywu-msft commented Jan 24, 2025 •

edited

Loading

gedoensmax commented Jan 24, 2025

snnn commented Jan 24, 2025 •

edited

Loading

yf711 commented Jan 30, 2025

gedoensmax commented Jan 30, 2025

gedoensmax commented Jan 31, 2025

Update range of gpu arch #23309

Update range of gpu arch #23309

Conversation

yf711 commented Jan 9, 2025 • edited Loading

Description

Motivation and Context

tianleiwu commented Jan 9, 2025

snnn commented Jan 9, 2025

snnn commented Jan 10, 2025

azure-pipelines bot commented Jan 10, 2025

yf711 commented Jan 11, 2025

snnn commented Jan 11, 2025

tianleiwu commented Jan 24, 2025

jywu-msft commented Jan 24, 2025 • edited Loading

gedoensmax commented Jan 24, 2025

snnn commented Jan 24, 2025 • edited Loading

yf711 commented Jan 30, 2025

gedoensmax commented Jan 30, 2025

gedoensmax commented Jan 31, 2025

yf711 commented Jan 9, 2025 •

edited

Loading

jywu-msft commented Jan 24, 2025 •

edited

Loading

snnn commented Jan 24, 2025 •

edited

Loading