Skip to content

Commit e3f5ab4

Browse files
quic-rishinrquic-dhirajku
authored andcommitted
Dynamic cache support on llama4 (quic#494)
Signed-off-by: Rishin <[email protected]> Signed-off-by: Dhiraj Kumar Sah <[email protected]>
1 parent 17b24c7 commit e3f5ab4

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

QEfficient/transformers/models/llama4/modeling_llama4.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@
3232
repeat_kv,
3333
)
3434

35-
from QEfficient.transformers.cache_utils import QEffHybridChunkedCache
35+
from QEfficient.transformers.cache_utils import QEffDynamicCache
3636
from QEfficient.transformers.modeling_attn_mask_utils import _create_causal_mask
3737
from QEfficient.utils import constants
3838
from QEfficient.utils._utils import IOInfo
@@ -638,7 +638,7 @@ def forward(
638638
return_legacy_cache = False
639639
if use_cache and not isinstance(past_key_values, Cache):
640640
return_legacy_cache = True
641-
past_key_values = QEffHybridChunkedCache.from_legacy_cache(self.config, past_key_values)
641+
past_key_values = QEffDynamicCache.from_legacy_cache(past_key_values)
642642

643643
if cache_position is None:
644644
past_seen_tokens = past_key_values.get_seq_length() if past_key_values is not None else 0

0 commit comments

Comments
 (0)