musa: upgrade musa sdk to rc4.2.0 #14498

yeahdongcn · 2025-07-02T08:18:13Z

Make sure to read the contributing guidelines before submitting a PR

This PR upgrades the MUSA SDK from rc4.0.1 to rc4.2.0.

Key updates

MUSA docker tags bump to rc4.2.0 and switch back to non-muDNN variant
New CMake options (default OFF): GGML_MUSA_GRAPHS for enabling MUSA graphs and GGML_MUSA_MUDNN_COPY for enabling muDNN copy acceleration
cuBLAS API alignment

Why disable muDNN by default

Although enabling muDNN can accelerate contiguous device memory copies, MUSA SDK rc4.2.0 no longer includes the fat binary and instead provides per-architecture binaries. Additionally, the performance of musaMemcpyAsync has been improved in this release, making the performance gap minimal.

Testing Done

test-backend-ops passed.

root@659744416c9a:/ws# ./build/bin/test-backend-ops 
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 MUSA devices:
  Device 0: MTT S80, compute capability 2.1, VMM: yes
Testing 2 devices

Backend 1/2: MUSA0
  Device description: MTT S80
  Device memory: 16297 MB (15731 MB free)
  ...
  7343/7343 tests passed
  Backend MUSA0: OK
Backend 2/2: CPU
  Skipping CPU backend
2/2 backends passed
OK

All Docker builds passed.

docker build -t local/llama.cpp:full-musa --target full -f .devops/musa.Dockerfile .
docker build -t local/llama.cpp:light-musa --target light -f .devops/musa.Dockerfile .
docker build -t local/llama.cpp:server-musa --target server -f .devops/musa.Dockerfile .

Verified the server image works as expected.

docker run -p 8080:8080 -it -v ~/models:/models local/llama.cpp:server-musa -m /models/DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf -ngl 999

Local build with -DGGML_MUSA_GRAPHS=ON succeeded.

root@7bd6a1e5dcc0:/ws# cmake -B build -DGGML_MUSA=ON -DMUSA_ARCHITECTURES=21 -DGGML_MUSA_GRAPHS=ON
root@7bd6a1e5dcc0:/ws# cmake --build build -j $(nproc) --config Release

Local build with -DGGML_MUSA_MUDNN_COPY=ON also succeeded when using the muDNN Docker image (as expected, it would fail with the non-muDNN variant).

root@7bd6a1e5dcc0:/ws# cmake -B build -DGGML_MUSA=ON -DMUSA_ARCHITECTURES=21 -DGGML_MUSA_MUDNN_COPY=ON
root@7bd6a1e5dcc0:/ws# cmake --build build -j $(nproc) --config Release

Signed-off-by: Xiaodong Ye <[email protected]>

ggerganov

ci change is OK

* musa: apply mublas API changes Signed-off-by: Xiaodong Ye <[email protected]> * musa: update musa version to 4.2.0 Signed-off-by: Xiaodong Ye <[email protected]> * musa: restore MUSA graph settings in CMakeLists.txt Signed-off-by: Xiaodong Ye <[email protected]> * musa: disable mudnnMemcpyAsync by default Signed-off-by: Xiaodong Ye <[email protected]> * musa: switch back to non-mudnn images Signed-off-by: Xiaodong Ye <[email protected]> * minor changes Signed-off-by: Xiaodong Ye <[email protected]> * musa: restore rc in docker image tag Signed-off-by: Xiaodong Ye <[email protected]> --------- Signed-off-by: Xiaodong Ye <[email protected]>

* origin/master: docs : update HOWTO‑add‑model.md for ModelBase and new model classes (ggml-org#14874) ggml : remove invalid portPos specifiers from dot files (ggml-org#14838) context : restore preemptive sched reset when LLAMA_SET_ROWS=0 (ggml-org#14870) mtmd : fix 32-bit narrowing issue in export-lora and mtmd clip (ggml-org#14503) rpc : check for null buffers in get/set/copy tensor endpoints (ggml-org#14868) sched : fix multiple evaluations of the same graph with pipeline parallelism (ggml-org#14855) musa: upgrade musa sdk to rc4.2.0 (ggml-org#14498) sync : ggml cmake : fix usage issues (ggml/1257) ggml-cpu : remove stdlib include from repack.cpp (ggml/1276) context : perform output reorder lazily upon access after sync (ggml-org#14853) chat : fix kimi-k2 chat template (ggml-org#14852) sycl: fixed semantics of block offset calculation (ggml-org#14814) llama : fix MiniCPM inference after Granite Four changes (ggml-org#14850) docs: add libcurl-dev install hint for Linux distros (ggml-org#14801) metal : fix fusion across different encoders (ggml-org#14849) sycl: fix undefined variable in work group size check (ggml-org#14843) convert : text-only support for GLM-4.1V-9B-Thinking (ggml-org#14823) CUDA: fix overflow in FA, tune performance (ggml-org#14840) CUDA: fix compilation with GGML_CUDA_F16 (ggml-org#14837)

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Jul 2, 2025

esrakorkmz approved these changes Jul 3, 2025

View reviewed changes

yeahdongcn mentioned this pull request Jul 14, 2025

musa: restore MUSA graph settings in CMakeLists.txt #13382

Closed

yeahdongcn force-pushed the xd/musa_sdk_upgrade branch from 39146cc to 314e40d Compare July 15, 2025 00:28

github-actions bot added documentation Improvements or additions to documentation devops improvements to build systems and github actions labels Jul 15, 2025

yeahdongcn changed the title ~~MUSA: upgrade musa sdk to <<TBD>>~~ MUSA: upgrade musa sdk to 4.2.0 Jul 15, 2025

yeahdongcn force-pushed the xd/musa_sdk_upgrade branch from 314e40d to 1bf073d Compare July 15, 2025 03:42

yeahdongcn changed the title ~~MUSA: upgrade musa sdk to 4.2.0~~ musa: upgrade musa sdk to 4.2.0 Jul 15, 2025

yeahdongcn mentioned this pull request Jul 15, 2025

musa: upgrade musa sdk to rc4.2.0 ggml-org/whisper.cpp#3324

Merged

yeahdongcn force-pushed the xd/musa_sdk_upgrade branch from 1bf073d to 9e7ccf3 Compare July 17, 2025 02:57

yeahdongcn mentioned this pull request Jul 18, 2025

musa: upgrade musa sdk to rc4.2.0 MooreThreads/ollama-musa#13

Merged

yeahdongcn force-pushed the xd/musa_sdk_upgrade branch from 1945505 to 01b1163 Compare July 22, 2025 04:31

yeahdongcn changed the title ~~musa: upgrade musa sdk to 4.2.0~~ musa: upgrade musa sdk to rc4.2.0 Jul 24, 2025

yeahdongcn added 7 commits July 24, 2025 09:42

musa: apply mublas API changes

4d7de19

Signed-off-by: Xiaodong Ye <[email protected]>

musa: update musa version to 4.2.0

0c2fabf

Signed-off-by: Xiaodong Ye <[email protected]>

musa: restore MUSA graph settings in CMakeLists.txt

a9882b1

Signed-off-by: Xiaodong Ye <[email protected]>

musa: disable mudnnMemcpyAsync by default

d3460b2

Signed-off-by: Xiaodong Ye <[email protected]>

musa: switch back to non-mudnn images

9d6e00a

Signed-off-by: Xiaodong Ye <[email protected]>

minor changes

0dacabf

Signed-off-by: Xiaodong Ye <[email protected]>

musa: restore rc in docker image tag

82aaeba

Signed-off-by: Xiaodong Ye <[email protected]>

yeahdongcn force-pushed the xd/musa_sdk_upgrade branch from 648aacd to 82aaeba Compare July 24, 2025 01:43

yeahdongcn requested review from JohannesGaessler and slaren July 24, 2025 01:45

yeahdongcn marked this pull request as ready for review July 24, 2025 01:46

yeahdongcn requested review from ngxson and ggerganov as code owners July 24, 2025 01:46

This was referenced Jul 24, 2025

musa: upgrade musa sdk to rc4.2.0 containers/ramalama#1697

Merged

musa: upgrade musa sdk to rc4.2.0 leejet/stable-diffusion.cpp#732

Open

yeahdongcn mentioned this pull request Jul 24, 2025

musa: upgrade musa sdk to rc4.2.0 gpustack/llama-box#59

Open

ggerganov approved these changes Jul 24, 2025

View reviewed changes

ericcurtin approved these changes Jul 24, 2025

View reviewed changes

ericcurtin merged commit 3f4fc97 into ggml-org:master Jul 24, 2025
47 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

musa: upgrade musa sdk to rc4.2.0 #14498

musa: upgrade musa sdk to rc4.2.0 #14498

Uh oh!

yeahdongcn commented Jul 2, 2025 •

edited

Loading

Uh oh!

ggerganov left a comment

Uh oh!

Uh oh!

Uh oh!

musa: upgrade musa sdk to rc4.2.0 #14498

musa: upgrade musa sdk to rc4.2.0 #14498

Uh oh!

Conversation

yeahdongcn commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key updates

Why disable muDNN by default

Testing Done

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yeahdongcn commented Jul 2, 2025 •

edited

Loading