Skip to content

Conversation

@FloatingcloudKnight
Copy link
Contributor

For the transpose-matrix multiplication-transpose pattern in Llama2, perform fusion and vectorization during dialect reduction. The acceleration ratio at the operator level is 1.84 before and after fusion, while it is not visible at the model level.
After applying this pass, the computational speed of transpose-matmul-transpose changed from 0.00528216 to 0.00191212.

For the transpose-matrix multiplication-transpose pattern in Llama2, perform fusion and vectorization during dialect reduction.
The acceleration ratio at the operator level is 1.84 before and after fusion, while it is not visible at the model level.
Value B = op->getOperand(1);
Value C = op->getOpResult(0);

tosa::ReshapeOp reshapeBOp = B.getDefiningOp<tosa::ReshapeOp>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you can use auto here, getDefiningOptosa::ReshapeOp(); we know the type of the op

Copy link
Member

@linuxlonelyeagle linuxlonelyeagle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a brief review.

if (!transposeBOp) {
return failure();
}
Value::user_iterator reshapeCUserIt = C.getUsers().begin();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C.getUsers().empty() is good

ShapedType newBType =
cast<ShapedType>(transposeBOp.getOperand(0).getType());
ShapedType newCType =
cast<ShapedType>(transposeCOp->getOpResult(0).getType());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use ransposeCOp->getresult->getType()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or transposeCOp.getType()

Value vlStep = rewriter.create<arith::ConstantIndexOp>(loc, vecSize);
Value zero = rewriter.create<arith::ConstantOp>(
loc, rewriter.getZeroAttr(elementType));
const AffineExpr d0 = rewriter.getAffineDimExpr(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use const


// Create pass through vector.
Value passThroughVec = rewriter.create<SplatOp>(loc, vectorTy, zero);
Value newA = rewriter.create<bufferization::ToMemrefOp>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to avoid using the bufferize dialect here?This is just a fusion pattern.

Value aCol = rewriter.create<memref::DimOp>(loc, newA, c2);
Value bCol = rewriter.create<memref::DimOp>(loc, newB, c3);

Value upperBoundTmp = rewriter.create<arith::SubIOp>(loc, bCol, vlStep);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sub and add we can use affine,Rather than creating add and sub operations.

// loopBody->addArguments(types, locs);
Block &loopBody = parOp.getRegion().front();
rewriter.setInsertionPointToStart(&loopBody);
Value ivs0 = loopBody.getArguments()[0];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iv = loopOp.getLoopInductionVar

newC,
ValueRange{c0, ivs1, ivs0, iv});
Value idx =
nestedBuilder.create<arith::AddIOp>(nestedLoc, iv, vlStep);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use affine.apply add

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your feedback. While considering avoiding the bufferize dialect, I discovered this pass can be moved from the TOSA level to be completed at the Linalg level. I will resubmit all modifications after completing this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants