Make safe most intrinsics that neither access memory nor impact processor state

# Proposal

## Problem statement



Currently, any time you want to use vector intrinsics directly, you have to resort to `unsafe` blocks. As that's still very much unstable, it's not exactly accessible for most currently. Also, there's features that doesn't cover, like AVX-512's result masking (which saves a *lot* of instructions in some niche cases).

## Motivating examples or use cases



I've got this code laying around in an experiment:

```rs
use std::arch::x86_64::*;

#[repr(C)]
#[derive(Clone, Copy)]
pub struct Color(__m128);

impl Color {
  #[inline]
  pub fn new(a: f32, r: f32, g: f32, b: f32) -> Color {
    unsafe { Color(_mm_setr_ps(a, r, g, b)) }
  }
  
  #[inline]
  pub fn a(self) -> f32 {
    f32::from_bits(unsafe { _mm_extract_ps::<0>(self.0) as u32 })
  }

  #[inline]
  pub fn r(self) -> f32 {
    f32::from_bits(unsafe { _mm_extract_ps::<1>(self.0) as u32 })
  }

  #[inline]
  pub fn g(self) -> f32 {
    f32::from_bits(unsafe { _mm_extract_ps::<2>(self.0) as u32 })
  }

  #[inline]
  pub fn b(self) -> f32 {
    f32::from_bits(unsafe { _mm_extract_ps::<3>(self.0) as u32 })
  }
}

#[inline]
fn with_alpha(d: __m128, s: __m128) -> __m128 {
  unsafe { _mm_blend_ps::<0b0001>(d, s) }
}

#[derive(Clone, Copy)]
pub struct Composite {
  a1: f32,
  a2: f32,
  b1: f32,
  b2: f32,
}

fn color_clamp(c: Color) -> Color {
  Color(unsafe { _mm_max_ps(_mm_set1_ps(0.0), _mm_min_ps(_mm_set1_ps(1.0), c.0)) })
}

impl Composite {
  pub fn compile(data: u8) -> Option<Composite> {
    fn extract(data: u8, offset_right: i8) -> Option<f32> {
      let value = ((data as i8) << offset_right) >> 6;
      (value != -2).then_some(value as f32)
    }

    Some(Composite {
      a1: extract(data, 6)?,
      a2: extract(data, 4)?,
      b1: extract(data, 2)?,
      b2: extract(data, 0)?,
    })
  }

  pub fn execute(&self, d: &mut Color, s: Color) {
    unsafe {
      let fr = s.a() * d.a() * (self.a2 as f32) + (self.a1 as f32);
      let fr = with_alpha(_mm_set1_ps(fr), _mm_set1_ps(1.0));
      let fd = d.a() * s.a() * (self.b2 as f32) + (self.b1 as f32);
      let fd = with_alpha(_mm_set1_ps(fd), _mm_set1_ps(1.0 - s.a()));
      *d = color_clamp(Color(_mm_add_ps(_mm_mul_ps(d.0, fd), _mm_mul_ps(fr, s.0))));
    }
  }
}
```

None of those `unsafe` blocks are truly dealing with anything unsafe [as per the book](https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html), [the unsafe code guidelines](https://rust-lang.github.io/unsafe-code-guidelines/layout/packed-simd-vectors.html), or the [reference](https://doc.rust-lang.org/reference/behavior-considered-undefined.html) [manual](https://doc.rust-lang.org/reference/behavior-not-considered-unsafe.html).

- All of these are already guarded with `#[target_feature]` as appropriate, thus avoiding undefined behavior on that front.
- The behavior for these are almost always as well-defined by the processor as integer addition. There are a few exceptions like `_mm256_castpd128_pd256` in x86.

## Solution sketch



For each architecture intrinsics that neither read nor modify memory or persistent processor state, make it safe and wrap the inner contents with an `unsafe` block as needed.

For x86-64, this equates to roughly the following:

- Leave all memory-accessing instructions `unsafe`
- Leave CPUID-related instructions `unsafe` as it reads persistent processor state
- Leave `__rdtscp` and `_rdtsc` `unsafe` as it reads persistent processor state
- Leave FXSR-related intrinsics `unsafe` as it interacts with persistent processor state
- Leave RDRAND-related intrinsics `unsafe` as it interacts with persistent processor state
- Leave XSAVE-related intrinsics `unsafe` as it interacts with persistent processor state
- Leave `_mm256_cast*128_*256` `unsafe` due to its upper half being undefined
- Leave `_mm512_cast*128_*512` `unsafe` due to its upper 3/4 being undefined
- Leave `_mm512_cast*256_*512` `unsafe` due to its upper half being undefined
- Leave `_mm*_undefined*` `unsafe` since they're intentionally supposed to be undefined
- Make safe everything else not listed here (intrinsics for the `bsf`/`bsr` instructions don't exist, so they don't need covered)

> I may have missed a thing or two - I didn't scan `core::arch::x86_64` too closely.

## Alternatives



Do nothing and just focus on `std::simd`. This is workable, but see my note on AVX-512's result masking for why this isn't helpful in of itself. (I didn't include an example for *that* here, but it wouldn't be hard for me to provide one.)

## Links and related work



## What happens now?

This issue is part of the libs-api team [API change proposal process]. Once this issue is filed the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

[API change proposal process]: https://std-dev-guide.rust-lang.org/feature-lifecycle/api-change-proposals.html

## Possible responses

The libs team may respond in various different ways. First, the team will consider the *problem* (this doesn't require any concrete solution or alternatives to have been proposed):

- We think this problem seems worth solving, and the standard library might be the right place to solve it.
- We think that this probably doesn't belong in the standard library.

Second, if there's a concrete solution:

- We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
- We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make safe most intrinsics that neither access memory nor impact processor state #243

Proposal

Problem statement

Motivating examples or use cases

Solution sketch

Alternatives

Links and related work

What happens now?

Possible responses

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Make safe most intrinsics that neither access memory nor impact processor state #243

Description

Proposal

Problem statement

Motivating examples or use cases

Solution sketch

Alternatives

Links and related work

What happens now?

Possible responses

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions