Skip to content

Lang proposal: extern "unspecified" for naked functions with arbitrary ABI #140566

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
tgross35 opened this issue May 1, 2025 · 18 comments
Open
Assignees
Labels
A-inline-assembly Area: Inline assembly (`asm!(…)`) C-feature-request Category: A feature request, i.e: not implemented / a PR. I-lang-nominated Nominated for discussion during a lang team meeting. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@tgross35
Copy link
Contributor

tgross35 commented May 1, 2025

Background

One of the uses of naked functions is to implement custom calling conventions. We have some code in compiler-builtins like this:

// NOTE This function and the ones below are implemented using assembly because they are using a
// custom calling convention which can't be implemented using a normal Rust function.
#[unsafe(naked)]
pub unsafe extern "C" fn __aeabi_uidivmod() {
    core::arch::naked_asm!(
        "push {{lr}}",
        "sub sp, sp, #4",
        "mov r2, sp",
        "bl {trampoline}",
        "ldr r1, [sp]",
        "add sp, sp, #4",
        "pop {{pc}}",
        trampoline = sym crate::arm::__udivmodsi4
    );
}

The ABI needs to be specified, so extern "C" is used. However, this is misleading as the function does not actually use the C calling convention.

Correct ABI would be considered part of the preconditions for this function and it would only be callable inside an unsafe block, but Rust has no way to call the function correctly so it seems like we should prevent this.

Proposal

Add a new "unspecified" ABI that may be used with naked functions. Rust will error on attempts to call them.

/// # Safety
///
/// This function implements a custom calling convention that requires the
/// following inputs:
/// * `r8` contains a pointer
/// * `r9` contains a length
/// The pointer in `r8` must be valid for reads up to `r9` bytes.
///
/// `r8` and `r9` are clobbered but no other registers are.
#[unsafe(naked)]
pub unsafe extern "unspecified" fn foo() {
    core::arch::naked_asm!(
        // ...
    );
}

// SAFETY: `bar` is provided by `libbar.a` which we link.
unsafe extern "unspecified" {
    fn bar();
}

fn call_foo(buf: &[u8]) {
    // SAFETY: I didn't read the docs
    unsafe {
        foo();
        //~^ ERROR: `foo` has an unspecified ABI and cannot be called directly
        bar();
        //~^ ERROR: `bar` has an unspecified ABI and cannot be called directly
    }

    // SAFETY: call `foo` with its specified ABI, account for r8 & r9 clobbers
    unsafe {
        core::arch::asm!(
            "mov r8 {ptr}",
            "mov r9 {len}",
            "call {foo}",
            ptr = in(reg) buf.as_ptr(),
            len = in(reg) buf.len(),
            out("r8") _,
            out("r9") _,
            foo = sym foo,
        )
    }
}

Proposed rules:

  1. extern "unspecified" can only be used with naked functions or extern blocks
  2. The compiler will reject calling any functions marked extern "unspecified". It can still be passed as a function pointer, and it can be a sym in an asm block.
  3. extern "unspecified" functions must be marked unsafe, and cannot be safe fn with an extern block. This is a hard error. (I'm less certain about this rule since unsafety doesn't mean much if you can't call it. Proposed because it seems consistent with how it must be used, given the function still has preconditions, and it's probably makes sense to treat them as unsafe in the compiler.)

Questions:

  1. What should it be named? "unspecified", "none", "any", and "unknown" all seem workable. Also suggested in this thread: "custom", "uncallable".
  2. Should parameters also be rejected? If the function is not callable, they don't serve much purpose other than documentation.

cc @rust-lang/lang, @folkertdev, @Amanieu

(currently empty) thread for discussion on Zulip: https://rust-lang.zulipchat.com/#narrow/channel/216763-project-inline-asm/topic/.60extern.20.22unspecified.22.60.20for.20naked.20functions/with/515596073

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label May 1, 2025
@tgross35 tgross35 added A-inline-assembly Area: Inline assembly (`asm!(…)`) T-lang Relevant to the language team, which will review and decide on the PR/issue. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. I-lang-nominated Nominated for discussion during a lang team meeting. C-feature-request Category: A feature request, i.e: not implemented / a PR. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels May 1, 2025
@folkertdev
Copy link
Contributor

I really like this idea. I suspect there is also some embedded code that would make use of this, e.g.

https://github.com/rust-embedded/cortex-m/blob/c3d664bba1148cc2d0f963ebeb788aa347ba81f7/cortex-m-rt/src/lib.rs#L529

(this example came up here #140279 (comment))

extern "unspecified" functions must be marked unsafe, and cannot be safe fn with an extern block.

I think this is the right call, because of documentation and tooling. extern "unspecified" should still have a # Safety section with details on how C/assembly could call this function.

@traviscross
Copy link
Contributor

Reading through it, this all sounds reasonable and right to me.

Maybe another name to consider for this ABI would be custom.

@traviscross traviscross self-assigned this May 2, 2025
@traviscross
Copy link
Contributor

traviscross commented May 2, 2025

I'll take assignment and "champion" this, as I seem to have been taking the asm ones recently. Under our process, with a champion, this can be implemented experimentally (and a tracking issue should be filed, etc.).

Of course, I'd first like to hear confirmation from @Amanieu that this seems reasonable. Also cc @RalfJung. And I'd like to hear from my fellow @rust-lang/lang members if anyone thinks this needs an RFC. My estimate is that it probably does not -- that this is a straightforward extension -- and that having a good stabilization report and the updated Reference documentation for this will be sufficient.

@workingjubilee
Copy link
Member

There are other uses of an "uncallable-by-Rust-code" ABI that suggest considering a broader perspective.

@workingjubilee
Copy link
Member

So, if the primary reason is to prevent Rust code from calling functions with that ABI, then there are already ~2 ABIs which need to have that implemented for them, and currently do not:

  • extern "interrupt"
  • extern "gpu-kernel"

extern "interrupt"

I say "two" and say extern "interrupt" instead of the half-dozen or so actual extern "???-interrupt" ABIs because those are not meaningfully different in terms of the actual semantics. All of the mentioned "ABIs" have an identically shared trait: they should not be callable by Rust code, and they restrict the arguments in their signature. In particular, almost all interrupt ABIs require the signature extern "interrupt" fn(), with the sole exception being extern "x86-interrupt"1

extern "gpu-kernel"

AKA `extern "ptx-kernel", this has the same deal: appeared more target-specific than it actually is, as it interacts with a shared problem, and has the same detail of being a function that serves as a kind of entry point for a jump/branch/call, but one that cannot be Rust code compiled for that host. This is because what it actually exists is to represent an entry-point from a CPU to a GPU. Thus it can only be "valid to call" from the internals of the device driver, which can remotely move the GPU's program counter.

Lump? Split? Neither?

This is another instance of a classic "lumping vs. splitting" concern. I am not sure I strongly feel that the decision must cut one way or another. I do think that they have enough of a relationship that we will want to explicitly decide at some point in one direction or another. Call it an "unanswered question" or a "consideration for future directions" if you like?

  • The case for lumping is that they are really all extern "uncallable" fn, in effect. Even that name is slightly off as they are all also "callable" in some sense, but only by a "magic" feat that is not available within the language per se.
  • The case for splitting is that there may be enough other details that differentiate how certain things must be handled, and we do not expect them to be reasonable to handle by some other means like attributes on the functions in question.

Footnotes

  1. and extern "x86-interrupt" has other severe drawbacks that recommend against ever actually using it, to the point it puts into question why we implement it.

@asquared31415
Copy link
Contributor

Personally I am in favor of the name unknown, because it's a calling convention not known to rust and therefore rust cannot directly call it. But bikeshedding aside, I really like this idea, and have had a few cases where it would have been useful, particularly in embedded and similar spaces with interacting with assembly or the system. I think that such an ABI is also suitable for the cases that Jubilee mentioned, even if they may not be perfect fits, it at least has the major component, which is that it cannot be called by rust, must be called externally ("externally" may be assembly or system interrupts or GPU or other means)

However, I think that a more refined solution for "this is a well defined ABI but must not be called by rust, it must be called by the GPU" would improve upon this.

Regarding processes, I think this is really only relevant to t-compiler, and maybe lang (it's technically syntax i guess?) and doesn't have any semantics, so I would expect this to just need an FCP to stabilize eventually.

@tgross35
Copy link
Contributor Author

tgross35 commented May 2, 2025

So, if the primary reason is to prevent Rust code from calling functions with that ABI, then there are already ~2 ABIs which need to have that implemented for them, and currently do not:

  • extern "interrupt"
  • extern "gpu-kernel"

Thanks for pointing these out, agreed that it makes sense to share the checking/error mechanisms.

It seems like ABIs could be grouped something like the following:

  1. Both: rustc can make a function with this ABI and call it. (Applies to most ABIs)
  2. Callee: rustc can define a function with this ABI but can't call it.
    • ABI can be used on "pure rust" functions (i.e. not only naked)
    • Calling is an error
  3. Caller: rustc can call functions with this ABI but can't define it.
    • Can only be declared by #[unsafe(naked)] or extern
    • Calling works correctly
    • I can't think of any examples
  4. Neither: rustc knows nothing about this ABI.
    • Can only be declared by #[unsafe(naked)] or extern
    • Calling is an error

"unspecified" (or whatever name we land on) would be for the fourth category where rustc has no idea about either side, so has to treat it more or less as an opaque symbol. I think that any of the first three categories require a known ABI name because rustc/LLVM has to be made aware of how to call or create it, depending on category.

Based on the description it seems extern "interrupt" falls into the second group, meaning rustc can create this function but never directly call it. So assuming we keep this feature, you can write an extern "x86-interrupt" fn foo function in Rust, but trying foo() will error. (From a quick test it looks like calling results in an ICE or an LLVM crash depending on arguments, so we should probably be doing that anyway).

The GPU case I am less certain about. If (1) there isn't anything unique about these functions that Rust needs to know about, (2) these functions can't be written in Rust (excluding asm), and (3) rustc only needs to know the symbols exist somewhere, then I think grouping it into "unspecified" seems reasonable.

@tgross35
Copy link
Contributor Author

tgross35 commented May 2, 2025

For naming, I suggested “unspecified” because it is true to both rustc (which can’t do much with the ABI because it is not provided) and to the author (who didn’t specify a specific ABI). I think “custom” works here as well. “unknown”, “any”, and “uncallable” on the other hand are true to rustc, but presumably the author knows what the specific ABI is and how to call it.

Just clarifying rationale, I don’t have a strong preference or think that’s the most solid reasoning.

@RalfJung
Copy link
Member

RalfJung commented May 2, 2025

Should parameters also be rejected? If the function is not callable, they don't serve much purpose other than documentation.

Do we allow arguments/return types on naked functions? What do they do?

@RalfJung
Copy link
Member

RalfJung commented May 2, 2025

extern "unspecified" can only be used with naked functions or extern blocks

What is the point of having an extern "unspecified" block, if such functions cannot be called?

@asquared31415
Copy link
Contributor

asquared31415 commented May 2, 2025

Presumably one could use a sym asm operand with a function in an extern "unspecified" block

@bjorn3
Copy link
Member

bjorn3 commented May 2, 2025

Do we allow arguments/return types on naked functions? What do they do?

Yes, they are both documentation and for naked functions that can be called from Rust they define how you can call it.

@folkertdev
Copy link
Contributor

Exactly,

  • the function can still be called from within asm!
  • the function address can still be used (e.g. stored in a vector table)

Do we allow arguments/return types on naked functions? What do they do?

We do, because you can totally write something like this

#[unsafe(naked)]
extern "C" fn (a: u64, b: u64) -> u64 {
    core::arch::naked_asm!(
        "lea rax, [rdi + rsi]",
        "ret",
    )
}

So here, the unsafe part of the attribute is a promise that the function implements the stated abi. This function can then be called as normal, it's not even unsafe to call.

@chorman0773
Copy link
Contributor

What is the point of having an extern "unspecified" block, if such functions cannot be called?

Adding to the above, also it's the only way to export a function with a potentially funny calling convention (other than to use an uncallable signature, such as one that accepts an uninhabited type) from a cdylib. Functions written in C or assembly and linked into Rust are hidden when building a cdylib due to how rustc hides Rust symbols.

@workingjubilee
Copy link
Member

The GPU case I am less certain about. If (1) there isn't anything unique about these functions that Rust needs to know about, (2) these functions can't be written in Rust (excluding asm), and (3) rustc only needs to know the symbols exist somewhere, then I think grouping it into "unspecified" seems reasonable.

extern "gpu-kernel" is in the second case of "Callee: rustc can define a function with this ABI but can't call it." To grossly oversimplify, it can be thought of as reasonable to think of every extern "gpu-kernel" function as being its own entry-point like fn main() is, with the caveats of

  • closer to _start in that you should really, really not reenter it from the program
  • more like a libcall in terms of what purpose it serves

@Amanieu
Copy link
Member

Amanieu commented May 3, 2025

Of course, I'd first like to hear confirmation from @Amanieu that this seems reasonable.

Yes, this is a feature that I would very much like to see. I don't have a specific preference for the exact name of the ABI though.

@moxian
Copy link
Contributor

moxian commented May 3, 2025

I would like to throw extern "manual" into the naming bikeshed.
The idea being that rustc cannot generate calls to those functions automatically, and need some "manual" asm work to do so. Although this ABI is only applicable to types (2) (yes define, no call) and (4) (no define, no call), but not (3) (no define, yes call) from the above classification, so it might not be generic enough.

For naming, I suggested “unspecified” because it is true to both rustc (which can’t do much with the ABI because it is not provided) and to the author (who didn’t specify a specific ABI).

I like the rationale, but I feel that the "unspecified" ABI is actually defined to the author. The author did actually specify the calling convention - in the comments. The ABI is only just not formalized, but it's still "specified" to humans.

That said, even though I feel that "manual" captures the intent more precisely, it is also a very common and loaded word, so it's less obvious for the first time readers what it means exactly. "unspecified" is much more pedagogic in this regard.

(FTR, I don't have a strong opinion on this)

@traviscross
Copy link
Contributor

traviscross commented May 3, 2025

I like the rationale, but I feel that the "unspecified" ABI is actually defined to the author. The author did actually specify the calling convention - in the comments. The ABI is only just not formalized, but it's still "specified" to humans.

Yes, this is why I think I prefer "custom".

Another possibility, that just came to mind, is calling the ABI "asm".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-inline-assembly Area: Inline assembly (`asm!(…)`) C-feature-request Category: A feature request, i.e: not implemented / a PR. I-lang-nominated Nominated for discussion during a lang team meeting. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests