Replies: 1 comment 1 reply
-
@Fadelis98 Hey
This is not true as we only materialize the last hidden state |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I found this awesome project recently and I'm trying to use the fla layers in a non-LLM task, where we have a very long sequence and only the hidden state of the last "token" is useful. The current recurrent kernels, for example gated_deltanet, always returns the hidden state of every token, that will allocate huge memory. Is there any way except call the kernel token by token in a for loop that can avoid the memory allocation?
Beta Was this translation helpful? Give feedback.
All reactions