-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Remove my scalar_copy_backend_type
optimization attempt
#123185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -1,4 +1,11 @@ | ||||||
//@ compile-flags: -O | ||||||
//@ revisions: OPT2 OPT3WINX64 OPT3LINX64 | ||||||
//@ [OPT2] compile-flags: -O | ||||||
//@ [OPT3LINX64] compile-flags: -C opt-level=3 | ||||||
//@ [OPT3WINX64] compile-flags: -C opt-level=3 | ||||||
//@ [OPT3LINX64] only-linux | ||||||
//@ [OPT3WINX64] only-windows | ||||||
//@ [OPT3LINX64] only-x86_64 | ||||||
//@ [OPT3WINX64] only-x86_64 | ||||||
//@ min-llvm-version: 18.1.3 | ||||||
|
||||||
#![crate_type = "lib"] | ||||||
|
@@ -9,15 +16,27 @@ | |||||
// to avoid complicating the code. | ||||||
// CHECK-LABEL: define{{.*}}void @convert( | ||||||
// CHECK-NOT: shufflevector | ||||||
// CHECK: insertelement <8 x i16> | ||||||
// CHECK-NEXT: insertelement <8 x i16> | ||||||
// CHECK-NEXT: insertelement <8 x i16> | ||||||
// CHECK-NEXT: insertelement <8 x i16> | ||||||
// CHECK-NEXT: insertelement <8 x i16> | ||||||
// CHECK-NEXT: insertelement <8 x i16> | ||||||
// CHECK-NEXT: insertelement <8 x i16> | ||||||
// CHECK-NEXT: insertelement <8 x i16> | ||||||
// CHECK-NEXT: store <8 x i16> | ||||||
// OPT2: store i16 | ||||||
// OPT2-NEXT: getelementptr inbounds i8, {{.+}} 2 | ||||||
// OPT2-NEXT: store i16 | ||||||
// OPT2-NEXT: getelementptr inbounds i8, {{.+}} 4 | ||||||
// OPT2-NEXT: store i16 | ||||||
// OPT2-NEXT: getelementptr inbounds i8, {{.+}} 6 | ||||||
// OPT2-NEXT: store i16 | ||||||
// OPT2-NEXT: getelementptr inbounds i8, {{.+}} 8 | ||||||
// OPT2-NEXT: store i16 | ||||||
// OPT2-NEXT: getelementptr inbounds i8, {{.+}} 10 | ||||||
// OPT2-NEXT: store i16 | ||||||
// OPT2-NEXT: getelementptr inbounds i8, {{.+}} 12 | ||||||
// OPT2-NEXT: store i16 | ||||||
// OPT2-NEXT: getelementptr inbounds i8, {{.+}} 14 | ||||||
// OPT2-NEXT: store i16 | ||||||
// OPT3LINX64: load <8 x i16> | ||||||
// OPT3LINX64-NEXT: call <8 x i16> @llvm.bswap | ||||||
// OPT3LINX64-NEXT: store <8 x i16> | ||||||
// OPT3WINX64: load <8 x i16> | ||||||
// OPT3WINX64-NEXT: call <8 x i16> @llvm.bswap | ||||||
// OPT3WINX64-NEXT: store <8 x i16> | ||||||
// CHECK-NEXT: ret void | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FYI @dianqk that I updated this test. There's no more vector operation generated in O2 -- but I think separate stores for it is probably fine and similar enough to the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, thanks! I hadn't noticed that different optimization levels yield different optimization effects. The changes to O2 seem fine. I'll check again on this later. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I confirm that is fine. I was just curious why the results are the same for O2 and O3 is in LLVM's opt before. rust/compiler/rustc_codegen_ssa/src/back/write.rs Lines 248 to 249 in f96442b
|
||||||
#[no_mangle] | ||||||
#[cfg(target_endian = "little")] | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that while we now generate an
alloca
andmemcpy
s for this (as seen in thearray-codegen
file), LLVM is able to remove them.If LLVM picks
i16
for this (and not<2 x i8>
), then great, let's do that.