Skip to content

Request: ARM SME support (for Apple M4).. #4715

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
oscarbg opened this issue May 23, 2024 · 20 comments
Open

Request: ARM SME support (for Apple M4).. #4715

oscarbg opened this issue May 23, 2024 · 20 comments

Comments

@oscarbg
Copy link

oscarbg commented May 23, 2024

No need for unofficial Apple AMX intruction set on M4..
2tflops possible..

@martin-frbg
Copy link
Collaborator

PRs welcome... do you have the hardware to test ?

@oscarbg
Copy link
Author

oscarbg commented May 24, 2024

Not yet.. waiting for a mac mini m4..

@brada4

This comment was marked as outdated.

@martin-frbg
Copy link
Collaborator

try reading that again, it's about SME...

@brada4

This comment was marked as off-topic.

@Mousius
Copy link
Contributor

Mousius commented May 25, 2024

You can develop and test using the Fixed Virtual Platform (FVP):
apache/tvm#16755
apache/tvm#16749

@brada4

This comment was marked as off-topic.

@martin-frbg
Copy link
Collaborator

some implementation hints there: https://scalable.uni-jena.de/opt/sme/index.html

@oscarbg
Copy link
Author

oscarbg commented Nov 22, 2024

Hi,
any news?
have a Mac Mini m4 to test..

@martin-frbg
Copy link
Collaborator

Bought an M4 mini myself recently but have not gotten around to doing much with it yet.

@martin-frbg
Copy link
Collaborator

#5084 added SME for the "small matrix" SGEMM pathway but needs some small tweaks to connect the M4 cpu target to it

#5011 has a more general SME GEMM kernel but needs fixes for proper SYMM/TRMM support before it can be merged

@ITCJ
Copy link

ITCJ commented Mar 27, 2025

#5084 added SME for the "small matrix" SGEMM pathway but needs some small tweaks to connect the M4 cpu target to it

#5011 has a more general SME GEMM kernel but needs fixes for proper SYMM/TRMM support before it can be merged

Based on a SC24 workshop Hello SME, llvm/llvm-project#114987 and llvm/llvm-project#95478 . Apple M4 does not support SVE outside of streaming.
However, concurrent [WIP] #5011 is on top of KERNEL.ARMV8SVE. Result in illegal instruction. Any good ideas to solve that? Is create a new KERNEL.M4SME2 based on KERNEL.ARMV8 a good idea?

@ITCJ
Copy link

ITCJ commented Mar 27, 2025

Further more, I have made some test on differences between SME1 and SME2 recently. It's quiet different to achieve best performance.
I don‘t known if ACLE could fully utilize these resources.

@martin-frbg
Copy link
Collaborator

Yes, M4 only does streaming SVE so you'd need at least some setup code to enter streaming mode and perhaps save some dual-use registers beforehand, or even work in a totally different set of registers than what the existing SVE code uses.

Both #5011 and #5084 introduced an ARMV9SME target for differentiation, it would also be possible to select kernel implementations (either at the KERNEL file level or within individual implementations) based on HAVE_SME or a similiar define. As #5011 is a WIP only concerned with GEMM and related functions, it does not work outside its narrow scope.

The way forward - at least short-term - should be to split out M4 from the general "VORTEX" target into its own designation and enable the SME-based "small gemm" pathway for it. I hope to complete this very soon.

@violet73
Copy link

Hi, I would like to ask why I encountered the following error on M4pro:

Image

Is it possible that my compiler does not recognize streaming flags?

Compilation:
clang -g -O0 -march=armv9.2-a+sme+sme2 ./test_sme_acle.cc -o ./test_sme_acle

Clang version:
Homebrew clang version 20.1.2
Target: arm64-apple-darwin24.3.0
Thread model: posix
InstalledDir: /opt/homebrew/Cellar/llvm/20.1.2/bin

@ITCJ
Copy link

ITCJ commented Apr 10, 2025

Hi, I would like to ask why I encountered the following error on M4pro:

Image

Is it possible that my compiler does not recognize streaming flags?

Compilation: clang -g -O0 -march=armv9.2-a+sme+sme2 ./test_sme_acle.cc -o ./test_sme_acle

Clang version: Homebrew clang version 20.1.2 Target: arm64-apple-darwin24.3.0 Thread model: posix InstalledDir: /opt/homebrew/Cellar/llvm/20.1.2/bin

looks like the same problem mentioned above. Try using disassemble --mixed to show illegal instruction.

@violet73
Copy link

looks like the same problem mentioned above. Try using disassemble --mixed to show illegal instruction.

Thank you for your kindly reply, I disassembled it in lldb and the illegel instruction turns out to be cntd!

That means the streaming flags will make the compiler add some illegal sve instructions that are not in streaming mode.

This is somehow wired. Because I can't manually set the streaming mode before main is called.

So I tried to remove the streaming flags in main and moved the sve code into another function foo
with the local streaming flags.

I also manually placed the invocation statement of foo within smstart and smstop.

After these, the code could finally run normally!

@ITCJ
Copy link

ITCJ commented Apr 10, 2025

looks like the same problem mentioned above. Try using disassemble --mixed to show illegal instruction.

Thank you for your kindly reply, I disassembled it in lldb and the illegel instruction turns out to be cntd!

That means the streaming flags will make the compiler add some illegal sve instructions that are not in streaming mode.

This is somehow wired. Because I can't manually set the streaming mode before main is called.

So I tried to remove the streaming flags in main and moved the sve code into another function foo with the local streaming flags.

I also manually placed the invocation statement of foo within smstart and smstop.

After these, the code could finally run normally!

congrats, I also tried resolve similar issues. I encounter ADVL during unit test.
加个微信?码发你邮箱了捏。

@martin-frbg
Copy link
Collaborator

the fast path for small matrix sgemm should be working on M4 with #5222 - but I'm now stuck on an illegal instruction error involving cntd/cntw myself, trying to get dot_kernel_sve working in streaming mode with the __arm_streaming attribute

@ITCJ
Copy link

ITCJ commented Apr 14, 2025

the fast path for small matrix sgemm should be working on M4 with #5222 - but I'm now stuck on an illegal instruction error involving cntd/cntw myself, trying to get dot_kernel_sve working in streaming mode with the __arm_streaming attribute

Is it a bug of LLVM compiler?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants