memory: Hybrid context shift #17009

gabe-l-hart · 2025-11-04T20:52:57Z

Description

This PR addresses context shift failure caused when a hybrid-recurrent model hits its context limit and attempts to perform context shifting. The main change here is to loosen the restriction in the llama_memory_recurrent::seq_rm to only refuse to do a partial erasure if the part being erased includes the final token in the sequence. Since recurrent states are fixed size, any partial erasure that does not include the final token can be considered a no-op.

Testing

To validate the result, you can use the following which artificially limits the context length to force a context shift:

# You can use any granite-4.0 model here
./bin/llama-cli -m ggml-org/granite-4.0-h-small-Q8_0-GGUF --jinja -c 100 --context-shift -p "tell me a story"

Without this fix, it will fail with init_batch: failed to prepare attention ubatches, but with this fix, it will successfully continue generating and produce generated output that is relevant to the previous context.

The recurrent state is always assumed to be the state as of the last update from the final token in the sequence. When doing a partial erasure, if the range does not include the final token, the erasure can be considered a success since any memory used for the sequence prior to the final token (which is no memory) has been successfully removed. There is one potential case that this doesn't address which is the pruning of cache to remove sensitive data from the context. This wouldn't work for attention cache partial removal (in the middle) either since the KV state is linearly-dependent and states in later sequence positions would still be based on the state from the sensitive data, even if that data is no longer cached, so I don't think this is relevant, but it is worth noting that the semantics of this change for a partial erasure in the middle of the cache are essentially "my context is already compressed" and not "all trace of the removed tokens has been removed." ggml-org#16768 Branch: HybridContextShift-16768 Signed-off-by: Gabe Goodhart <[email protected]>

This prefix matching is explicitly attempting to remove the tokens at the end of the sequence that don't match. This is the operation that can't be performed on a recurrent cache due to the state being updated in place, so if this removal fails, we need to clear the whole cache. ggml-org#16768 Branch: HybridContextShift-16768 Signed-off-by: Gabe Goodhart <[email protected]>

src/llama-memory-recurrent.cpp

Signed-off-by: Gabe Goodhart <[email protected]> Co-authored-by: compilade <[email protected]>

ggerganov · 2025-11-05T08:02:40Z

src/llama-memory-recurrent.cpp

-            // partial intersection is invalid
-            if ((0 < p0 && p0 < cell.pos) || (0 < p1 && p1 <= cell.pos)) {
+            // partial intersection is invalid if it includes the final pos
+            if ((0 < p0 && p0 <= cell.pos && p1 > cell.pos)) {


Suggested change

if ((0 < p0 && p0 <= cell.pos && p1 > cell.pos)) {

if (0 < p0 && p0 <= cell.pos && p1 > cell.pos) {

ggerganov · 2025-11-05T08:03:35Z

src/llama-memory-recurrent.cpp

-            // partial intersection is invalid
-            if ((0 < p0 && p0 < cell.pos) || (0 < p1 && p1 <= cell.pos)) {
+            // partial intersection is invalid if it includes the final pos
+            if ((0 < p0 && p0 <= cell.pos && p1 > cell.pos)) {


Why do we check strictly larger than 0 rather than 0 <= p0?

gabe-l-hart added 2 commits November 4, 2025 13:44

gabe-l-hart requested a review from ggerganov as a code owner November 4, 2025 20:52

gabe-l-hart requested a review from compilade November 4, 2025 20:53

gabe-l-hart mentioned this pull request Nov 4, 2025

The result of bool llama_memory_seq_rm() is not checked #16768

Open

compilade reviewed Nov 4, 2025

View reviewed changes

src/llama-memory-recurrent.cpp Outdated Show resolved Hide resolved

DajanaV mentioned this pull request Nov 4, 2025

UPSTREAM PR #17009: memory: Hybrid context shift auroralabs-loci/llama.cpp#85

Open

fix(memory): Fix condition for partial erasure failure if p0 > pos

3b59021

Signed-off-by: Gabe Goodhart <[email protected]> Co-authored-by: compilade <[email protected]>

github-actions bot added the examples label Nov 4, 2025

ggerganov reviewed Nov 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

memory: Hybrid context shift #17009

memory: Hybrid context shift #17009

gabe-l-hart commented Nov 4, 2025

Uh oh!

Uh oh!

ggerganov Nov 5, 2025

Uh oh!

ggerganov Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	if ((0 < p0 && p0 <= cell.pos && p1 > cell.pos)) {
	if (0 < p0 && p0 <= cell.pos && p1 > cell.pos) {

memory: Hybrid context shift #17009

Are you sure you want to change the base?

memory: Hybrid context shift #17009

Conversation

gabe-l-hart commented Nov 4, 2025

Description

Testing

Uh oh!

Uh oh!

ggerganov Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

ggerganov Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants