Skip to content

Commit 2cfe79a

Browse files
pratikasharArtem Gindinson
authored and
Artem Gindinson
committed
Skip RMW for loop split temporaries
RMW optimization in spill insertion should conservatively assume that loop split temporaries require read-modify-write sequence. Following explains why this is required: // ******************** // Before: // Loop_Header: // V10:uq = ... // ... // jmpi Loop_Header // ... // = V10 // ******************** // After split around loop: // // (W) LOOP_TMP = V10 // Loop_Header: // LOOP_TMP:uq = ... // ... // jmpi Loop_Header // (W) V10 = LOOP_TMP // ... // = V10 // ******************** // Since V10 is spilled already, spill/fill code is inserted as: // (Note that program no longer has direct references to V10) // // (W) FILL_TMP = Fill from V10 offset // (W) LOOP_TMP = FILL_TMP // Loop_Header: // LOOP_TMP:uq = ... // ... // jmpi Loop_Header // (W) SPILL_TMP = LOOP_TMP // (W) Spill SPILL_TMP to V10 offset // ... // (W) FILL_TMP1 = Fill from V10 offset // = FILL_TMP1 // ******************** // // If LOOP_TMP is spilled in later iteration, we need to check whether // RMW is needed for its def in the loop body. But by this iteration // all original references to V10 have already been transformed to // temporary ranges, so we cannot easily determine dominance relation // between LOOP_TMP and other V10 references. If LOOP_TMP doesn't // dominate all defs and uses then it would be illegal to skip RMW. Hence, // we conservatively assume RMW is required for LOOP_TMP. (cherry picked from commit 887a2e8)
1 parent e90e360 commit 2cfe79a

File tree

1 file changed

+53
-5
lines changed

1 file changed

+53
-5
lines changed

visa/SpillManagerGMRF.cpp

+53-5
Original file line numberDiff line numberDiff line change
@@ -2800,13 +2800,61 @@ void SpillManagerGRF::updateRMWNeeded() {
28002800
// Check0 : Def is NoMask, -- checked in isPartialWriteForSpill()
28012801
// Check1 : Def is unique def,
28022802
// Check2 : Def is in loop L and all use(s) of dcl are in loop L or it's
2803-
// inner loop nest, Check3 : Flowgraph is reducible RMW_Not_Needed = Check0
2804-
// || (Check1 && Check2 && Check3)
2803+
// inner loop nest,
2804+
// Check3 : Flowgraph is reducible
2805+
// Check4 : Dcl is not a split around loop temp
2806+
// RMW_Not_Needed = (Check0 || (Check1 && Check2 && Check3)) && Check4
28052807
bool RMW_Needed = true;
28062808

2807-
if (isUniqueDef && builder_->kernel.fg.isReducible() &&
2808-
checkDefUseDomRel(spilledRegion, bb)) {
2809-
RMW_Needed = false;
2809+
// Reason for Check4:
2810+
// ********************
2811+
// Before:
2812+
// Loop_Header:
2813+
// V10:uq = ...
2814+
// ...
2815+
// jmpi Loop_Header
2816+
// ...
2817+
// = V10
2818+
// ********************
2819+
// After split around loop:
2820+
//
2821+
// (W) LOOP_TMP = V10
2822+
// Loop_Header:
2823+
// LOOP_TMP:uq = ...
2824+
// ...
2825+
// jmpi Loop_Header
2826+
// (W) V10 = LOOP_TMP
2827+
// ...
2828+
// = V10
2829+
// ********************
2830+
// Since V10 is spilled already, spill/fill code is inserted as:
2831+
// (Note that program no longer has direct references to V10)
2832+
//
2833+
// (W) FILL_TMP = Fill from V10 offset
2834+
// (W) LOOP_TMP = FILL_TMP
2835+
// Loop_Header:
2836+
// LOOP_TMP:uq = ...
2837+
// ...
2838+
// jmpi Loop_Header
2839+
// (W) SPILL_TMP = LOOP_TMP
2840+
// (W) Spill SPILL_TMP to V10 offset
2841+
// ...
2842+
// (W) FILL_TMP1 = Fill from V10 offset
2843+
// = FILL_TMP1
2844+
// ********************
2845+
//
2846+
// If LOOP_TMP is spilled in later iteration, we need to check whether
2847+
// RMW is needed for its def in the loop body. But by this iteration
2848+
// all original references to V10 have already been transformed to
2849+
// temporary ranges, so we cannot easily determine dominance relation
2850+
// between LOOP_TMP and other V10 references. If LOOP_TMP doesn't
2851+
// dominate all defs and uses then it would be illegal to skip RMW. Hence,
2852+
// we conservatively assume RMW is required for LOOP_TMP.
2853+
if (gra.splitResults.count(spilledRegion->getTopDcl()) == 0) {
2854+
if (isUniqueDef && builder_->kernel.fg.isReducible() &&
2855+
checkDefUseDomRel(spilledRegion, bb)) {
2856+
RMW_Needed = false;
2857+
}
28102858
}
28112859

28122860
return RMW_Needed;

0 commit comments

Comments
 (0)