cuda: remove linking to cublasLt #14790

yeahdongcn · 2025-07-21T01:28:14Z

Make sure to read the contributing guidelines before submitting a PR

The new MUSA SDK now includes mublasLt (the equivalent of cublasLt). However, I found that llama.cpp doesn't use any of the cublasLt* APIs. Therefore, this PR removes cublasLt from the build link dependencies.

# ldd /usr/local/lib/python3.12/dist-packages/nvidia/cublas/lib/libcublas.so.12
        linux-vdso.so.1 (0x00007ffc7b9d8000)
        libcublasLt.so.12 => /usr/local/lib/python3.12/dist-packages/nvidia/cublas/lib/libcublasLt.so.12 (0x00007fdf7cc00000)
        librt.so.1 => /usr/lib/x86_64-linux-gnu/librt.so.1 (0x00007fdfa1857000)
        libpthread.so.0 => /usr/lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fdfa1852000)
        libdl.so.2 => /usr/lib/x86_64-linux-gnu/libdl.so.2 (0x00007fdfa184d000)
        libm.so.6 => /usr/lib/x86_64-linux-gnu/libm.so.6 (0x00007fdfa1764000)
        libgcc_s.so.1 => /usr/lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fdfa1744000)
        libc.so.6 => /usr/lib/x86_64-linux-gnu/libc.so.6 (0x00007fdf7c9d7000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fdfa1865000)

slaren · 2025-07-21T10:17:07Z

I am pretty sure that cublas depends on cublasLt, and building without it with the Makefile resulted in link errors. cmake adds the dependency automatically when linking to cublas, so removing it there is probably correct: https://gitlab.kitware.com/cmake/cmake/-/blob/master/Modules/FindCUDAToolkit.cmake#L1282. The Makefile is broken at this point, so I could not test it.

yeahdongcn · 2025-07-21T10:25:00Z

I am pretty sure that cublas depends on cublasLt, and building without it with the Makefile resulted in link errors. cmake adds the dependency automatically when linking to cublas, so removing it there is probably correct: https://gitlab.kitware.com/cmake/cmake/-/blob/master/Modules/FindCUDAToolkit.cmake#L1282. The Makefile is broken at this point, so I could not test it.

Let me revert the change to Makefile.

Signed-off-by: Xiaodong Ye <[email protected]>

JohannesGaessler

From the CUDA documentation:

The cuBLASLt library is a new lightweight library dedicated to GEneral Matrix-to-matrix Multiply (GEMM) operations with a new flexible API.

We don't use cuBLASLt directly, but I guess they may be mapping the old cuBLAS API to cuBLASLt?

yeahdongcn · 2025-07-21T14:31:55Z

We don't use cuBLASLt directly, but I guess they may be mapping the old cuBLAS API to cuBLASLt?

In my understanding (though I’m not certain), cuBLAS might have a mechanism to automatically dispatch or fall back to cuBLASLt depending on the data types, memory layouts, or hardware.

* origin/master: (49 commits) ci : correct label refactor->refactoring (ggml-org#14832) CUDA: fix quantized KV cache + multiple sequences (ggml-org#14822) tests : add non-cont K,V FA tests memory : handle saving/loading null layers in recurrent memory (ggml-org#14675) ggml: fix loongarch quantize_row_q8_1 error (ggml-org#14827) CANN: weight format to NZ for Ascend310P3 (ggml-org#14407) CUDA: add fused rms norm (ggml-org#14800) ggml : model card yaml tab->2xspace (ggml-org#14819) vulkan: fix rms_norm_mul to handle broadcasting dim0 (ggml-org#14817) llama : add model type detection for rwkv7 7B&14B (ggml-org#14816) imatrix: add option to display importance score statistics for a given imatrix file (ggml-org#12718) Mtmd: add a way to select device for vision encoder (ggml-org#14236) cuda : implement bf16 cpy ops and enable bf16 cont (ggml-org#14763) opencl: remove unreachable `return` (ggml-org#14806) server : allow setting `--reverse-prompt` arg (ggml-org#14799) cuda: remove linking to cublasLt (ggml-org#14790) opencl: fix `im2col` when `KW!=KH` (ggml-org#14803) opencl: add conv2d kernel (ggml-org#14403) sycl: Fix im2col (ggml-org#14797) kleidiai: add support for get_rows (ggml-org#14676) ...

Signed-off-by: Xiaodong Ye <[email protected]>

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Jul 21, 2025

yeahdongcn marked this pull request as ready for review July 21, 2025 06:29

yeahdongcn requested review from slaren and JohannesGaessler July 21, 2025 06:29

cuda: remove linking to cublasLt

02373a3

Signed-off-by: Xiaodong Ye <[email protected]>

yeahdongcn force-pushed the cuda/remove_cublasLt branch from 28d7b3c to 02373a3 Compare July 21, 2025 10:25

slaren approved these changes Jul 21, 2025

View reviewed changes

JohannesGaessler approved these changes Jul 21, 2025

View reviewed changes

yeahdongcn merged commit 48b86c4 into ggml-org:master Jul 21, 2025
47 checks passed

taronaeo pushed a commit to taronaeo/llama.cpp-s390x that referenced this pull request Jul 25, 2025

cuda: remove linking to cublasLt (ggml-org#14790)

9e500e2

Signed-off-by: Xiaodong Ye <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cuda: remove linking to cublasLt #14790

cuda: remove linking to cublasLt #14790

Uh oh!

yeahdongcn commented Jul 21, 2025 •

edited

Loading

Uh oh!

slaren commented Jul 21, 2025

Uh oh!

yeahdongcn commented Jul 21, 2025

Uh oh!

JohannesGaessler left a comment

Uh oh!

yeahdongcn commented Jul 21, 2025

Uh oh!

Uh oh!

Uh oh!

cuda: remove linking to cublasLt #14790

cuda: remove linking to cublasLt #14790

Uh oh!

Conversation

yeahdongcn commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

slaren commented Jul 21, 2025

Uh oh!

yeahdongcn commented Jul 21, 2025

Uh oh!

JohannesGaessler left a comment

Choose a reason for hiding this comment

Uh oh!

yeahdongcn commented Jul 21, 2025

Uh oh!

Uh oh!

Uh oh!

yeahdongcn commented Jul 21, 2025 •

edited

Loading