[RFC] Support for more finetuning Transformers to RNNs
methods (e.g., LOLCATS)
#127
Labels
finetuning Transformers to RNNs
methods (e.g., LOLCATS)
#127
Proposal
Training linear attention from scratch is expensive. Converting pretrained Transformers to linear attention is a popular paradigm for saving costs.
Rationale
No response
The text was updated successfully, but these errors were encountered: