-
Notifications
You must be signed in to change notification settings - Fork 218
[Frontend] Lower attention op to FlashAttention vectorized kernel. #553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
7984633 to
cba4060
Compare
cba4060 to
d20bd35
Compare
d20bd35 to
f399cb2
Compare
9b5de4c to
94425c4
Compare
|
Hi, thanks for the feedback.
|
bf0d407 to
2c9afcc
Compare
|
@zhanghb97 |
1341e89 to
5138603
Compare
|
@GuoningHuang It looks pretty good. Please resolve the conflicts, then I'll pull it down to test the performance. |













baseline:
apply for both prefill and decode phase:
