Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building NumPy from source for Windows on ARM using Clang-cl compiler #28106

Open
Mugundanmcw opened this issue Jan 6, 2025 · 15 comments
Open
Labels
component: SIMD Issues in SIMD (fast instruction sets) code or machinery

Comments

@Mugundanmcw
Copy link

Hello Developers,

  • I am facing an issue while trying to build NumPy for Windows on ARM (WoA) using the Clang-cl compiler. Building NumPy from source requires C and C++ compilers with proper intrinsic support.
  • Previously, I was able to successfully compile NumPy for WoA using the MSVC-optimized C/C++ CL compiler, enabling CPU baseline features that support ARM.
  • However, I encountered limitations with the MSVC C/C++ CL compiler, as it does not support certain CPU dispatcher features like ASIMDHP, ASIMDFHM, and SVE. Is there any specific reason why these CPU dispatch features are not supported for WoA in MSVC?
  • Meanwhile, I attempted to compile NumPy for WoA using the clang-cl compiler (both from MSVC and LLVM toolchains) to check if the CPU dispatcher features would be enabled. While I found that, apart from SVE, all other test features—including baseline features—were supported, I ran into compilation errors due to unidentified instructions.

Steps to Reproduce

  1. Clone the Source code of NumPy and checkout to latest branch
  2. Install LLVM toolchain/MSVC clang toolset
  3. Remove the clang and clang++ from the bin directory to avoid conflicts
  4. Add the bin path at the top of environment path variable

Compilers used for compilation:
image

Error and Workaround:

  1. While building meson_cpu target, got an error with respect to invalid operand "fstcw" in multiarray_tests_c source file. Upon going through source code, the fstcw is floating-point control instructions for x86 assembly. So I made workaround to make one more condition to check whether it is a ARM64 arch build. Then the build proceeded:
    Screenshot 2025-01-06 123245
    Workaround:
    before:
    image
    After:
    image

Issue:

  1. Currently the build fails at 240+ targets while compiling meson_cpu due to unidentified assembly instructions:
    image

Can anyone give some suggestions to overcome this issue? I need enable CPU dispatch support for NumPy on WoA to get better optimised version of NumPy.

Thanks!

@seberg seberg added the component: SIMD Issues in SIMD (fast instruction sets) code or machinery label Jan 6, 2025
@seberg
Copy link
Member

seberg commented Jan 6, 2025

Ping @Mousius, might be interesting to you (or you have a quick idea).

@Mugundanmcw
Copy link
Author

@Mousius Any suggestions on this issue?

@DavidSpickett
Copy link

These look like MSVC intrinsics - https://learn.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics?view=msvc-170. We do not have support for all of these in clang-cl at the moment.

Just 2 days ago someone asked about this in fact - llvm/llvm-project#121689.

Without knwoing the numpy source code I can't suggest how to work around this, but the table on the Microsoft page tells you what each one does and for example, one of them produces the https://developer.arm.com/documentation/100069/0606/Data-Transfer-Instructions/LDARB instruction. So in theory you could use alternative APIs to do the same thing.

If anyone needs help figuring out the details of what the instructions do, I can help with that.

As for adding these intrinsics to clang-cl, I'll expand on that in the LLVM issue.

@seberg
Copy link
Member

seberg commented Jan 7, 2025

Ah, sorry, I misdiagnosed thinking it was related to SIMD (the second screenshot is so small... text would be easier).

This seems to be related to the atomics definitions, ping @ngoldbaum, I thought these are borrowed from Python, so it seems a bit surprising.

@DavidSpickett
Copy link

DavidSpickett commented Jan 7, 2025

As a temporary workaround, it might work to edit the source so that the #ifdef STDC_ATOMICS branch is used instead (

#ifdef STDC_ATOMICS
).

I thought these are borrowed from Python

I recall Linaro doing work for Windows on Arm Python but it may not have been using clang-cl.

Edit: It was all done with msvc/Visual Studio not clang-cl.

@Mugundanmcw
Copy link
Author

Is there any other workarounds I could perform for compiling NumPy on WoA?

@seberg
Copy link
Member

seberg commented Jan 7, 2025

Unless you want to dig into it yourself, please be patient for at least a few days. This will be fixed, but don't expect it to be fixed within hours.

@ngoldbaum
Copy link
Member

I can install Windows in a VM on my ARM Macbook and hopefully reproduce this. Sorry for the trouble...

By the way, what command are you using to build NumPy? IIRC you need to go a little out of your way to build with clang-cl properly.

@ngoldbaum
Copy link
Member

As a temporary workaround, it might work to edit the source so that the #ifdef STDC_ATOMICS branch is used instead

Is there a reason why STDC_ATOMICS isn't defined on the reporter's system?

@ngoldbaum
Copy link
Member

ngoldbaum commented Jan 7, 2025

I just successfully built NumPy after making the patch to _multiarray_tests.c.src suggested by OP. I do not see the same error about missing intrinsics as it seems clang-cl on my system is going into the STDC_ATOMICS branch as I expected it to do originally.

I suspect that there is something subtly wrong about the OP's compilation environment. Here's how I built NumPy, doing all this in a checkout of the NumPy repo:

"[binaries]","c = 'clang-cl'","cpp = 'clang-cl'","ar = 'llvm-lib'","c_ld = 'lld-link'","cpp_ld = 'lld-link'" | Out-File $PWD/clang-cl-build.ini -Encoding ascii
pip install -r requirements/build_requirements.txt
spin build -- --vsenv --native-file=$PWD/clang-cl-build.ini

Or alternatively via pip to actually install the numpy build:

python -m pip install -v . --no-build-isolation -C'setup-args=--vsenv' -C'setup-args=--native-file='$PWD'\clang-cl-build.ini'

I did this following our CI setup on github actions for clang-cl.

@ngoldbaum
Copy link
Member

OP is short for original post, I was referring to the patch you found for the multiarray tests file.

You should be able to build NumPy using one of the commands I shared in my last comment after applying the patch you suggested for the tests file.

At least right now with clang-cl it is not sufficient to build just use spin build, you need to do something a little more involved to insure the clang toolchain is being used.

Please feel free to send in a pull request for the fix you found for the multiarray tests file.

@Mugundanmcw
Copy link
Author

@ngoldbaum the following is my workflow to build NumPy natively on WoA

  1. Installed LLVM toolchain for WoA 19.1.0 from releases.
  2. Added the LLVM\bin path to the environment variables
  3. Then I applied the patch in the OP to make sure that it was able to compile multiarray_umath_Test with out any errors
  4. Then I created the build configuration file clang_cl_ini.build that you were mentioning in the below command
    "[binaries]","c = 'clang-cl'","cpp = 'clang-cl'","ar = 'llvm-lib'","c_ld = 'lld-link'","cpp_ld = 'lld-link'" | Out-File $PWD/clang-cl-build.ini -Encoding ascii
  5. Before getting starting with the build, I make sure that build uses LLVM's clang-cl rather than msvc build of clang-cl
    image
  6. I proceeded with build by using the following commands:
    pip install -r requirements/build_requirements.txt spin build -- --vsenv --native-file=$PWD/clang-cl-build.ini

But still the error points out to the same issue:
image

As per logic you said the code flow should enter stdatomic but still the definiton fails out to enter it

@ngoldbaum
Copy link
Member

I used MSVC's build of clang-cl. I don't know if it's possible to use clang's. Ping @rgommers who knows more about this than me.

@rgommers
Copy link
Member

rgommers commented Jan 8, 2025

It should be possible in principle; we use clang-cl from the Clang feedstock in conda-forge to build SciPy for example.

I have no knowledge specific to WoA + Clang-cl though.

@ngoldbaum
Copy link
Member

I'm confused why STDC_ATOMIC isn't defined on your setup - it definitely should be on clang 19. I would try to debug why that's happening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: SIMD Issues in SIMD (fast instruction sets) code or machinery
Projects
None yet
Development

No branches or pull requests

5 participants