Skip to content

[AMDGPU] Handle MachineOperandType global address in SIFoldOperands. #135424

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 5, 2025
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1161,6 +1161,10 @@ void SIFoldOperandsImpl::foldOperand(

if (OpToFold.isImm())
UseMI->getOperand(1).ChangeToImmediate(OpToFold.getImm());
else if (OpToFold.isGlobal())
UseMI->getOperand(1).ChangeToGA(OpToFold.getGlobal(),
OpToFold.getOffset(),
OpToFold.getTargetFlags());
else
UseMI->getOperand(1).ChangeToFrameIndex(OpToFold.getIndex());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe swap these? Handle isFI in the else if. We could also include explicit check for isGlobal, and give up on other operand types

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe swap these? Handle isFI in the else if. We could also include explicit check for isGlobal, and give up on other operand types

Swapping isImm() and isGlobal() seems fine, if that's what you meant.

There is a UseMI->setDesc() before this conditional block, so it was not clear whether it was too late to give up because a change may have been made already?

It seemed minimum changes compared to the existing code if the existing "else" was kept untouched and a new ifGlobal() handling was added since that was the asserting case to start with.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the setDesc would need to be sunk if the other case were handled

Copy link
Contributor Author

@isakhilesh isakhilesh Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @arsenm, I had initially considered handling isFI() in else if, but as @dhruvachak mentioned, UseMI->setDesc() has potentially made changes already, so I am keeping it in the else block since there might be modifications required with respect to setDesc() as well in case we swap them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, I meant do if (isImm) {} else if (isFI()) {} { else global

But you could also just move the setDesc to avoid the side effect

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can modify the patch to the first approach you mention here. Since we enter this block only in case FoldingImmLike is true and this variable is a bool for either immediate, frame index or global values. The if-else if-else conditions focus on only these 3 operands so there won't be any edge case for the else condition as we enter the block when it is one of the 3 operands.

But I'm not clear on how/where should I move the setDesc() to, since it is almost at the top of the block and the if-else if-else is immediately after it.

The structure is ->

if(FoldingImmLike){
if(execMaybeModifiedBeforeUse)
return;
UseMI->setDesc() // at the top of the block, in case the exec flag immediately above is false
if-else if-else block
UseMI->removeOperand(2)
return;
}

Where should I place setDesc() to avoid a side effect?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can modify the patch to the first approach you mention here. Since we enter this block only in case FoldingImmLike is true and this variable is a bool for either immediate, frame index or global values. The if-else if-else conditions focus on only these 3 operands so there won't be any edge case for the else condition as we enter the block when it is one of the 3 operands.

But I'm not clear on how/where should I move the setDesc() to, since it is almost at the top of the block and the if-else if-else is immediately after it.

The structure is ->

if(FoldingImmLike){ if(execMaybeModifiedBeforeUse) return; UseMI->setDesc() // at the top of the block, in case the exec flag immediately above is false if-else if-else block UseMI->removeOperand(2) return; }

Where should I place setDesc() to avoid a side effect?

I think setDesc() has to be sunk only if you bail out. How about the following?

if (isImm()) {}
else if (isFI()) {}
else {
assert(isGlobal());
...
}
Then you don't need to move setDesc().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dhruvachak is spelling out the same as I stated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I have updated the patch based on the suggested structure, thanks a lot. Please let me know if I can improve it further.

UseMI->removeOperand(2); // Remove exec read (or src1 for readlane)
Expand Down
46 changes: 46 additions & 0 deletions llvm/test/CodeGen/AMDGPU/swdev504645-global-fold.ll
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
; RUN: llc -mtriple=amdgcn -mcpu=gfx942 < %s | FileCheck %s

define void @test_load_zext() {
; CHECK-LABEL: test_load_zext:
; CHECK: ; %bb.0: ; %.entry
; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; CHECK-NEXT: s_mov_b32 s0, s33
; CHECK-NEXT: s_mov_b32 s33, s32
; CHECK-NEXT: s_or_saveexec_b64 s[2:3], -1
; CHECK-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; CHECK-NEXT: s_mov_b64 exec, s[2:3]
; CHECK-NEXT: s_add_i32 s32, s32, 16
; CHECK-NEXT: v_writelane_b32 v40, s0, 2
; CHECK-NEXT: s_getpc_b64 s[0:1]
; CHECK-NEXT: s_add_u32 s0, s0, has_spgr_args@gotpcrel32@lo+4
; CHECK-NEXT: s_addc_u32 s1, s1, has_spgr_args@gotpcrel32@hi+12
; CHECK-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
; CHECK-NEXT: v_writelane_b32 v40, s30, 0
; CHECK-NEXT: s_mov_b32 s0, DescriptorBuffer@abs32@lo
; CHECK-NEXT: v_writelane_b32 v40, s31, 1
; CHECK-NEXT: s_waitcnt lgkmcnt(0)
; CHECK-NEXT: s_swappc_b64 s[30:31], s[2:3]
; CHECK-NEXT: v_readlane_b32 s31, v40, 1
; CHECK-NEXT: v_readlane_b32 s30, v40, 0
; CHECK-NEXT: s_mov_b32 s32, s33
; CHECK-NEXT: v_readlane_b32 s0, v40, 2
; CHECK-NEXT: s_or_saveexec_b64 s[2:3], -1
; CHECK-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
; CHECK-NEXT: s_mov_b64 exec, s[2:3]
; CHECK-NEXT: s_mov_b32 s33, s0
; CHECK-NEXT: s_waitcnt vmcnt(0)
; CHECK-NEXT: s_setpc_b64 s[30:31]
.entry:
%reloc = call i32 @llvm.amdgcn.reloc.constant(metadata !0)
call void @has_spgr_args(i32 %reloc)
ret void
}

declare void @has_spgr_args(i32 inreg)

declare i32 @llvm.amdgcn.reloc.constant(metadata) #0

attributes #0 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }

!0 = !{!"DescriptorBuffer", i32 4, i32 8, i32 0, i32 0}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need these attribute?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed the attribute code in line 44 but have kept the code in line 46 because without it there was an error asking for !0.
This file was reduced in llvm-reduce from an input .ll file generated by a fuzzer based on the ticket mentioned in the description.