-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Handle DelayFree for HW_Category_SIMDByIndexedElement intrinsics #114525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cc: @a74nh |
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
This might need a backport because this can lead to silent bad codegen issue for lot of intrinsics that fall in |
@@ -1481,6 +1481,21 @@ int LinearScan::BuildHWIntrinsic(GenTreeHWIntrinsic* intrinsicTree, int* pDstCou | |||
{ | |||
srcCount += BuildContainedCselUses(containedCselOp, delayFreeOp, candidates); | |||
} | |||
else if ((intrin.category == HW_Category_SIMDByIndexedElement) && (genTypeSize(intrin.baseType) == 2) && !HWIntrinsicInfo::HasImmediateOperand(intrin.id)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for finding this. To keep in the existing style, could you:
Expand getDelayFreeOperand()
to:
GenTree* LinearScan::getDelayFreeOperand(GenTreeHWIntrinsic* intrinsicTree, bool embedded, bool *forceDelay)
Where forceDelay
is a return argument.
Move your else if
into getDelayFreeOperand()
(I guess it'll go into the default case) and if it passes, set *forceDelay=true
and return nullptr.
Then in line 1500 ("Only build as delay free use if register types match") add a || forceDelay == true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my first choice, but getDelayFreeOperand()
is outside this for loop and that will return nullptr
even for op1
. If that happens, we won't be able to tgtPrefUse = delayUse
i.e. we won't be able to prefer the register allocated for op1
for target
as well. Usually when targetReg == op1Reg
, we avoid an extra move before the RMW instruction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I hadn't really taken the for
loop into account properly.
For all these SIMDByIndexedElement
intrisnics, will op1
always be a delay operand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you see the code, it's not delayed as such, but rather is a way for us to tell that it is the preferred register for target. It is not marked as delay-free.
/azp run Antigen, Fuzzlyn, runtime-coreclr jitstressregs, runtime-coreclr libraries-jitstressregs |
Azure Pipelines successfully started running 4 pipeline(s). |
Antigen, Fuzzlyn and libraries-jitstressregs failures are unrelated. |
/ba-g timeout issue |
In #107459, we refactored the code to build refpostions for HWIntrinsics tree nodes. As part of that, we removed the correct way of handling of intrinsics that fall under
HW_Category_SIMDByIndexedElement
(e.g. MLS) and whose operands have restricted registersV0~V15
. We previously marked RefPositions for such operands as "delay-free", but after refactoring, we stopped marking it such and as a result we assigned conflicting registers totargetReg
as well as one of the 2nd/3rd operand. Because of that, code gen need to do the required move frommov targetReg, op1Reg
(whenevertargetReg != op1reg
), but iftargetReg
also contains the value of 2nd/3rd operand, the move would overwrite its value.Here is the portion of changes that changed the behavior.
Fixes: #114358, #114322