Possible wrong implementation of beam search

# Prerequisites

Please answer the following questions for yourself before submitting an issue.

- [x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [x] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md).
- [x] I [searched using keywords relevant to my issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/filtering-and-searching-issues-and-pull-requests) to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/ggerganov/llama.cpp/discussions), and have a new bug or useful enhancement to share.

# Expected Behavior
Beam search should use different contexts for each beam.


# Current Behavior

In the beam-search.cpp example, the beam search technique applies the identical context across all beams. Understandably, if the scenario presents two beams, namely a-b-c-d-e and a-b-c-f-g, the shared prefix a-b-c is identified so as not to repeat the inference process on this section. Subsequently, separate inferences are conducted on sections d-e and f-g.

However, it's important to note an inefficiency here: during the inference process of d-e, these elements are stored into the key-value (kv) cache. Later, when the program is running the f-g inference, it erroneously accesses the kv-cache which accordingly may include data from the d-e sequence. This is inappropriate since d-e is not relevant to the current beam, hence leading to potential fallacies.



# Steps to Reproduce
mkdir build; cd build; cmake ..; make;
./build/bin/beam-search /path-to-gguf/llama-2-7b.Q4_0.gguf 2 "this is a nice day,"


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Possible wrong implementation of beam search #3802

Prerequisites

Expected Behavior

Current Behavior

Steps to Reproduce

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Possible wrong implementation of beam search #3802

Description

Prerequisites

Expected Behavior

Current Behavior

Steps to Reproduce

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions