Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Fuse shortconv and output norm/gate into the kernel #140

Open
sustcsonglin opened this issue Jan 24, 2025 · 0 comments
Open

[RFC] Fuse shortconv and output norm/gate into the kernel #140

sustcsonglin opened this issue Jan 24, 2025 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@sustcsonglin
Copy link
Collaborator

sustcsonglin commented Jan 24, 2025

Proposal

fuse shortconv and output norm/gate into kernels, as in Mamba1 and Mamba2

Rationale

QKV ShortConv will introduce three additional activations, resulting in a non-negligible memory overhead.

@sustcsonglin sustcsonglin added the enhancement New feature or request label Jan 24, 2025
@sustcsonglin sustcsonglin added this to the FLA v1.0.0 release milestone Jan 24, 2025
@yzhangcs yzhangcs self-assigned this Jan 26, 2025
@sustcsonglin sustcsonglin changed the title [RFC] Fuse shortconv into the kernel [RFC] Fuse shortconv and output norm/gate into the kernel Feb 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants