Skip to content

Attempt to improve codegen for arrays of repeated enums #104384

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions compiler/rustc_codegen_llvm/src/builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -561,6 +561,19 @@ impl<'a, 'll, 'tcx> BuilderMethods<'a, 'tcx> for Builder<'a, 'll, 'tcx> {
count: u64,
dest: PlaceRef<'tcx, &'ll Value>,
) -> Self {
if let OperandValue::Pair(mut v1, mut v2) = cg_elem.val {
v1 = self.from_immediate(v1);
v2 = self.from_immediate(v2);
let ty = self.cx().val_ty(v1);
// Create a vector of size 2*count and store it in one instruction
if ty == self.cx().val_ty(v2) {
let vec = self.vector_repeat_two(v1, v2, count as usize);
let vec = OperandRef::from_immediate_or_packed_pair(&mut self, vec, dest.layout);
vec.val.store(&mut self, dest);
return self;
}
}

let zero = self.const_usize(0);
let count = self.const_usize(count);
let start = dest.project_index(&mut self, zero).llval;
Expand Down Expand Up @@ -1328,6 +1341,27 @@ impl<'a, 'll, 'tcx> Builder<'a, 'll, 'tcx> {
unsafe { llvm::LLVMRustBuildVectorReduceMax(self.llbuilder, src, is_signed) }
}

// (v1, v2, 3) -> [v1, v2, v1, v2, v1, v2]
pub fn vector_repeat_two(
&mut self,
v1: &'ll Value,
v2: &'ll Value,
times: usize,
) -> &'ll Value {
let ty = self.cx().val_ty(v1);
debug_assert!(ty == self.cx().val_ty(v2));
// shufflevector <2 x i8> <v1, v2>, <2 x i8> undef, <(timesx2) x i32> <i32 0, i32 1, i32 0...>
let undef = unsafe { llvm::LLVMGetUndef(self.type_vector(ty, 2)) };
let vec1 = self.insert_element(undef, v1, self.const_i32(0));
let vec1 = self.insert_element(vec1, v2, self.const_i32(1));
let mask = std::iter::repeat([self.const_i32(0), self.const_i32(1)])
.take(times)
.flatten()
.collect::<Vec<_>>();
let mask = self.const_vector(&mask);
self.shuffle_vector(vec1, undef, mask)
}

pub fn add_clause(&mut self, landing_pad: &'ll Value, clause: &'ll Value) {
unsafe {
llvm::LLVMAddClause(landing_pad, clause);
Expand Down
10 changes: 1 addition & 9 deletions compiler/rustc_codegen_ssa/src/mir/operand.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,6 @@ use rustc_middle::ty::layout::{LayoutOf, TyAndLayout};
use rustc_middle::ty::Ty;
use rustc_target::abi::{Abi, Align, Size};

use std::fmt;

/// The representation of a Rust value. The enum variant is in fact
/// uniquely determined by the value's type, but is kept as a
/// safety check.
Expand All @@ -38,7 +36,7 @@ pub enum OperandValue<V> {
/// to avoid nasty edge cases. In particular, using `Builder::store`
/// directly is sure to cause problems -- use `OperandRef::store`
/// instead.
#[derive(Copy, Clone)]
#[derive(Copy, Clone, Debug)]
pub struct OperandRef<'tcx, V> {
// The value.
pub val: OperandValue<V>,
Expand All @@ -47,12 +45,6 @@ pub struct OperandRef<'tcx, V> {
pub layout: TyAndLayout<'tcx>,
}

impl<V: CodegenObject> fmt::Debug for OperandRef<'_, V> {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(f, "OperandRef({:?} @ {:?})", self.val, self.layout)
}
}

impl<'a, 'tcx, V: CodegenObject> OperandRef<'tcx, V> {
pub fn new_zst<Bx: BuilderMethods<'a, 'tcx, Value = V>>(
bx: &mut Bx,
Expand Down
19 changes: 19 additions & 0 deletions src/test/codegen/enum-repeat.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
// compile-flags: -O

#![crate_type = "lib"]

// CHECK-LABEL: @none_repeat
#[no_mangle]
pub fn none_repeat() -> [Option<u8>; 64] {
// CHECK: store <128 x i8>
// CHECK-NEXT: ret void
[None; 64]
}

// CHECK-LABEL: @some_repeat
#[no_mangle]
pub fn some_repeat() -> [Option<u8>; 64] {
// CHECK: store <128 x i8>
// CHECK-NEXT: ret void
[Some(0); 64]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's interesting to me that some_repeat seems to do 16 bytes at a time on x64, but none_repeat doesn't. Might be interesting to look at why LLVM is treating them differently.

Also, out of curiosity, what's the assembly difference before/after this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

godbolt - This is none_repeat before and after LLVM opts - it looks like LLVM is optimising out the undefined write and forgetting about it. This happened even if I explicitly added store undef.

The current assembly output is

  • None; 64 - 64 single byte movs to the tag
  • Some(0); 64 - A 16 byte constant vector (alternating 0/1) is stored into the array with movups
  • Some(1); 64 - The same as Some(0), but with an all-1's pattern

With this patch

  • None; 64 - 16 byte vector created with xorps, then movups
  • Some(0); 64/Some(1); 64 - Same as before

}