Skip to content

Possible wrong implementation of beam search #3802

Closed
@shenjiangqiu

Description

@shenjiangqiu

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

Beam search should use different contexts for each beam.

Current Behavior

In the beam-search.cpp example, the beam search technique applies the identical context across all beams. Understandably, if the scenario presents two beams, namely a-b-c-d-e and a-b-c-f-g, the shared prefix a-b-c is identified so as not to repeat the inference process on this section. Subsequently, separate inferences are conducted on sections d-e and f-g.

However, it's important to note an inefficiency here: during the inference process of d-e, these elements are stored into the key-value (kv) cache. Later, when the program is running the f-g inference, it erroneously accesses the kv-cache which accordingly may include data from the d-e sequence. This is inappropriate since d-e is not relevant to the current beam, hence leading to potential fallacies.

Steps to Reproduce

mkdir build; cd build; cmake ..; make;
./build/bin/beam-search /path-to-gguf/llama-2-7b.Q4_0.gguf 2 "this is a nice day,"

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstale

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions