Change codegen of LLVM intrinsics to be name-based, and add llvm linkage support for `bf16(xN)` and `i1xN` #140763

sayantn · 2025-05-07T19:08:04Z

This PR changes how LLVM intrinsics are codegen

Explanation of the changes

Current procedure

This is the same for all functions, LLVM intrinsics are not treated specially

We get the LLVM Type of a function simply using the argument types. For example, the following function
```
#[link_name = "llvm.sqrt.f32"]
fn sqrtf32(a: f32) -> f32;
```
will have LLVM type simply f32 (f32) due to the Rust signature

Pros

Simpler to implement, no extra complexity involved due to LLVM intrinsics

Cons

LLVM intrinsics have a well-defined signature, completely defined by their name (and if it is overloaded, the type parameters). So, this process of converting Rust signatures to LLVM signatures may not work, for example the following code generates LLVM IR without any problem
```
#[link_name = "llvm.sqrt.f32"]
fn sqrtf32(a: i32) -> f32;
```
but the generated LLVM IR is invalid, because it has wrong signature for the intrinsic (Godbolt, adding -Zverify-llvm-ir to it will fail compilation). I would expect this code to not compile at all instead of generating invalid IR.
LLVM intrinsics that have types in their signature that can't be accessed from Rust (notable examples are the AMX intrinsics that have the x86amx type, and (almost) all intrinsics that have vectors of i1 types) can't be linked to at all. This is a (major?) roadblock in the AMX and AVX512 support in stdarch.
If code uses an non-existing LLVM intrinsic, even -Zverify-llvm-ir won't complain. Eventually it will error out due to the non-existing function (courtesy of the linker). I don't think this is a behavior we want.

What this PR does

When linking to non-overloaded intrinsics, we use the function LLVMIntrinsicGetType to directly get the function type of the intrinsic from LLVM.
We then use this LLVM definition to verify the Rust signature, and emit a proper error if it doesn't match, instead of silently emitting invalid IR.
Lint if linking to deprecated or invalid LLVM intrinsics

Note

This PR only focuses on non-overloaded intrinsics, overloaded can be done in a future PR

Regardless, the undermentioned functionalities work for all intrinsics

If we can't find the intrinsic, we check if it has been AutoUpgraded by LLVM. If not, that means it is an invalid intrinsic, and we error out.
Don't allow intrinsics from other archs to be declared, e.g. error out if an AArch64 intrinsic is declared when we are compiling for x86

Pros

It is now not possible (or at least, it would require significantly more leaps and bounds) to introduce invalid IR using non-overloaded LLVM intrinsics.
As we are now doing the matching of Rust signatures to LLVM intrinsics ourselves, we can now add bypasses to enable linking to such non-Rust types (e.g. matching 8192-bit vectors to x86amx and injecting llvm.x86.cast.vector.to.tile and llvm.x86.cast.tile.to.vectors in callsite)

Note

I don't intend for these bypasses to be permanent. A better approach will be introducing a bf16 type in Rust, and allowing repr(simd) with bools to get Rust-native i1xNs. These are meant to be short-time, as I mentioned, "bypass"es. They shouldn't cause any major breakage even if removed, as link_llvm_intrinsics is perma-unstable.

This PR adds bypasses for bf16 (via i16), bf16xN (via i16xN) and i1xN (via iM, where M is the smallest power of 2 s.t. M >= N, unless N <= 4, where we use M = 8). This will unblock AVX512-VP2INTERSECT and a lot of bf16 intrinsics in stdarch. This PR also automatically destructures structs if the types don't exactly match (this is required for us to start emitting hard errors on mismmatches).

Cons

This only works for non-overloaded intrinsics (at least for now). Improving this to work with overloaded intrinsics too will involve significantly more work.

Possible ways to extend this to overloaded intrinsics (future)

Parse the mangled intrinsic name to get the type parameters

LLVM has a stable mangling of intrinsic names with type parameters (in LLVMIntrinsicCopyOverloadedName2), so we can parse the name to get the type parameters, and then just do the same thing.

Pros

For most intrinsics, this will work perfectly, and is a easy way to do this.

Cons

The LLVM mangling is not perfectly reversible. When we have TargetExt types or identified structs, their name is a part of the mangling, making it impossible to reverse. Even more complexities arise when there are unnamed identified structs, as LLVM adds more mangling to the names.
@nikic's work on LLVM intrinsics will remove the name mangling, making this approach impossible

Use the `IITDescriptor` table and the Rust function signature

We can use the base name to get the IITDescriptors of the corresponding intrinsic, and then manually implement the matching logic based on the Rust signature.

Pros

Doesn't have the above mentioned limitation of the parsing approach, has correct behavior even when there are identified structs and TargetExt types. Also, fun fact, Rust exports all struct types as literal structs (unless it is emitting LLVM IR, then it always uses named identified structs, with mangled names)

Cons

Doesn't actually use the type parameters in the name, only uses the base name and the Rust signature to get the llvm signature (although we can check that it is the correct name). It means there would be no way to (for example) link against llvm.sqrt.bf16 until we have bf16 types in Rust. Because if we are using u16s (or any other type) as bf16s, then the matcher will deduce that the signature is u16 (u16) not bf16 (bf16) (which would lead to an error because u16 is not a valid type parameter for llvm.sqrt), even though the intended type parameter is specified in the name.
Much more complex, and hard to maintain as LLVM gets new IITDescriptorKinds

These 2 approaches might give different results for same function. Let's take

#[link_name = "llvm.is.constant.bf16"]
fn foo(a: u16) -> bool

The name-based approach will decide that the type parameter is bf16, and the LLVM signature is i1 (bf16) and will inject some bitcasts at callsite.
The IITDescriptor-based approach will decide that the LLVM signature is i1 (u16), and will see that the name given doesn't match the expected name (llvm.is.constant.u16), and will error out.

Reviews are welcome, as this is my first time actually contributing to rustc

After CI is green, we would need a try build.

@rustbot label T-compiler A-codegen A-LLVM
r? codegen

rustbot · 2025-05-08T07:46:55Z

Some changes occurred in compiler/rustc_codegen_ssa

cc @WaffleLapkin

rustbot · 2025-05-08T15:50:36Z

Some changes occurred in compiler/rustc_codegen_gcc

cc @antoyo, @GuillaumeGomez

dianqk · 2025-05-09T07:15:47Z

I think you can use LLVMGetIntrinsicDeclaration, LLVMGetIntrinsicDeclaration or some functions in Intrinsic.h in declare_raw_fn, as a reference: https://github.com/llvm/llvm-project/blob/d35ad58859c97521edab7b2eddfa9fe6838b9a5e/llvm/lib/AsmParser/LLParser.cpp#L330-L335.

sayantn · 2025-05-09T07:29:37Z

That can be used to improve performance, I am not really focusing on performance in this PR. I want to currently emphasize the correctness of the codegen.

sayantn · 2025-05-09T07:30:43Z

Oh wait, I probably misunderstood your comment, you meant using the llvm declaration by itself. Yeah, that would be better, thanks for the info. I will update the impl when I get the chance

dianqk · 2025-05-15T22:55:34Z

Oh wait, I probably misunderstood your comment, you meant using the llvm declaration by itself. Yeah, that would be better, thanks for the info. I will update the impl when I get the chance

I think you can just focus on non-overloaded functions for this PR. Overloaded functions and type checking that checking Rust function signatures using LLVM defined can be subsequent PRs.

@rustbot author

rustbot · 2025-05-15T22:55:39Z

Reminder, once the PR becomes ready for a review, use @rustbot ready.

nikic · 2025-05-19T19:21:03Z

@sayantn Taking the address of an intrinsic is invalid LLVM IR.

sayantn · 2025-11-12T00:23:03Z

Seems like @dianqk self-assigned this a few days ago

dianqk · 2025-11-12T01:35:50Z

Seems like @dianqk self-assigned this a few days ago

I'm looking for an opportunity to review the PR.

sayantn · 2025-11-12T01:47:20Z

@dianqk it would probably be better if you don't review this now - I would change the implementation a bit after #148533 lands (to make it less intrusive to the codegen of normal functions), but the behavior should remain the same

@rustbot author

sayantn · 2025-12-16T19:36:22Z

This is currently blocked on #148533, but I have the new version of the PR ready. The diff with an rebased version of #148533 can be seen here

@rustbot label +S-blocked

bors · 2025-12-28T02:24:21Z

☔ The latest upstream changes (presumably #150448) made this pull request unmergeable. Please resolve the merge conflicts.

rustbot · 2025-12-28T05:08:07Z

This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

…nown intrinsics

sayantn · 2025-12-28T07:17:28Z

This just corrects usage of one LLVM intrinsic in tests/run-make/simd-ffi/simd.rs, and moves the declaration of LLVMGetReturnType from enzyme_ffi.rs to ffi.rs as it is used in the logic.

@rustbot label -S-blocked -A-run-make -F-autodiff
@rustbot ready

edit: somehow triagebot got tripped by this command - it removed the S-blocked label, and reported that it failed to remove it. cc @rust-lang/triagebot

rustbot · 2025-12-28T07:17:32Z

Error: shortcut handler unexpectedly failed in this comment: failed to remove Label { name: "S-blocked" }

Please file an issue on GitHub at triagebot if there's a problem with this bot, or reach out on #triagebot on Zulip.

Urgau · 2025-12-28T10:01:52Z

edit: somehow triagebot got tripped by this command - it removed the S-blocked label, and reported that it failed to remove it.

This is unfortunate, but expected, as both commands tries to remove the S-blocked label¹, which they both saw as set on this issue. There is not much we can do on triagebot side, both commands run concurrently with the same state.

Next time, try not mix manual status labels removal and a shortcut command.

https://github.com/rust-lang/triagebot/blob/a967c85caea92ac9b8d8af53b0867f6760736cc4/src/handlers/shortcut.rs#L32 ↩

sayantn · 2025-12-28T10:49:33Z

Thanks ❤️, I didn't know @rustbot ready removes blocked labels

bjorn3 · 2025-12-28T12:01:41Z

compiler/rustc_codegen_llvm/src/intrinsic.rs

-                    llvm::Visibility::Default,
-                    fn_ty,
-                );
-                fn_abi.apply_attrs_llfn(self, llfn, Some(instance));


Why was this removed?

IIUC we don't need to apply any attributes on intrinsics - LLVM AutoUpgrade applies them automatically. In fact as we are autocasting some arguments here, adding attributes will be erroneous - e.g. if i32 was casted to i1x32 it will try to apply sext attribute on i1x32, which LLVM doesn't understand. Same problems if we cast i16 to bf16. Even if e.g. we are autocasting i1x4 to i8 (in return position, ofc), LLVM autoupgrade specifies that it should do a zero-extension, no matter whether the Rust signature i8 or u8

rustbot assigned dianqk May 7, 2025

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. O-x86_64 Target: x86-64 processors (like x86_64-*) (also known as amd64 and x64) labels May 7, 2025

This comment has been minimized.

Sign in to view

sayantn force-pushed the test-amx branch from d2aee69 to 4d29196 Compare May 8, 2025 07:58

This comment has been minimized.

Sign in to view

sayantn force-pushed the test-amx branch from 4d29196 to 4fe6c09 Compare May 8, 2025 15:50

sayantn force-pushed the test-amx branch from 4fe6c09 to 9eddf13 Compare May 8, 2025 15:53

This comment has been minimized.

Sign in to view

sayantn force-pushed the test-amx branch from 9eddf13 to 1fd0c2d Compare May 8, 2025 16:17

sayantn changed the title ~~Add auto-bitcasts from/to x86amx and i32x256 for AMX intrinsics~~ Add auto-bitcasts from/to x86amx for i32x256 for AMX intrinsics May 8, 2025

sayantn force-pushed the test-amx branch from 1fd0c2d to 9218afe Compare May 8, 2025 18:14

sayantn changed the title ~~Add auto-bitcasts from/to x86amx for i32x256 for AMX intrinsics~~ Add auto-bitcasts between x86amx and i32x256 for AMX intrinsics May 8, 2025

This comment has been minimized.

Sign in to view

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 15, 2025

sayantn force-pushed the test-amx branch from 9218afe to 685adca Compare May 18, 2025 15:27

This comment has been minimized.

Sign in to view

sayantn marked this pull request as draft May 19, 2025 07:23

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 12, 2025

ZuseZ4 removed the F-autodiff `#![feature(autodiff)]` label Nov 22, 2025

rustbot added the S-blocked Status: Blocked on something else such as an RFC or other implementation work. label Dec 16, 2025

Codegen non-overloaded LLVM intrinsics using their name

d915232

sayantn force-pushed the test-amx branch from 5d71f5f to 6b4a313 Compare December 28, 2025 05:08

rustbot added the F-autodiff `#![feature(autodiff)]` label Dec 28, 2025

This comment has been minimized.

Sign in to view

sayantn added 5 commits December 28, 2025 12:42

Check for AutoUpgraded intrinsics, and lint on uses of deprecated/unk…

26e11b8

…nown intrinsics

Add target arch verification for LLVM intrinsics

18c4505

Add autocasts for structs

26ba178

Add autocast for i1 vectors

a454e49

Add autocast for bf16 and bf16xN

ddaf2cf

sayantn force-pushed the test-amx branch from 6b4a313 to ddaf2cf Compare December 28, 2025 07:12

rustbot removed S-blocked Status: Blocked on something else such as an RFC or other implementation work. A-run-make Area: port run-make Makefiles to rmake.rs F-autodiff `#![feature(autodiff)]` labels Dec 28, 2025

rustbot removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Dec 28, 2025

sayantn added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Dec 28, 2025

bjorn3 reviewed Dec 28, 2025

View reviewed changes

Uh oh!

Change codegen of LLVM intrinsics to be name-based, and add llvm linkage support for bf16(xN) and i1xN #140763

Are you sure you want to change the base?

Change codegen of LLVM intrinsics to be name-based, and add llvm linkage support for bf16(xN) and i1xN #140763

Uh oh!

Conversation

sayantn commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Explanation of the changes

Current procedure

Pros

Cons

What this PR does

Pros

Cons

Possible ways to extend this to overloaded intrinsics (future)

Parse the mangled intrinsic name to get the type parameters

Pros

Cons

Use the IITDescriptor table and the Rust function signature

Pros

Cons

Uh oh!

rustbot commented May 8, 2025

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

rustbot commented May 8, 2025

Uh oh!

This comment has been minimized.

This comment has been minimized.

dianqk commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sayantn commented May 9, 2025

Uh oh!

sayantn commented May 9, 2025

Uh oh!

dianqk commented May 15, 2025

Uh oh!

rustbot commented May 15, 2025

Uh oh!

This comment has been minimized.

This comment has been minimized.

nikic commented May 19, 2025

Uh oh!

sayantn commented Nov 12, 2025

Uh oh!

dianqk commented Nov 12, 2025

Uh oh!

sayantn commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sayantn commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bors commented Dec 28, 2025

Uh oh!

rustbot commented Dec 28, 2025

Uh oh!

This comment has been minimized.

sayantn commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rustbot commented Dec 28, 2025

Uh oh!

Urgau commented Dec 28, 2025

Footnotes

Uh oh!

sayantn commented Dec 28, 2025

Uh oh!

bjorn3 Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

sayantn Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Change codegen of LLVM intrinsics to be name-based, and add llvm linkage support for `bf16(xN)` and `i1xN` #140763

Change codegen of LLVM intrinsics to be name-based, and add llvm linkage support for `bf16(xN)` and `i1xN` #140763

sayantn commented May 7, 2025 •

edited

Loading

Use the `IITDescriptor` table and the Rust function signature

dianqk commented May 9, 2025 •

edited

Loading

sayantn commented Nov 12, 2025 •

edited

Loading

sayantn commented Dec 16, 2025 •

edited

Loading

sayantn commented Dec 28, 2025 •

edited

Loading

sayantn Dec 28, 2025 •

edited

Loading