llama : add example for speculative sampling

Speculative sampling is explained here: https://arxiv.org/abs/2302.01318

In more simple terms here:

- https://github.com/ggerganov/llama.cpp/issues/630#issuecomment-1518745593
- https://github.com/ggerganov/llama.cpp/issues/630#issuecomment-1556448281

For start, the "draft" model can be generated using the [train-text-from-scratch](https://github.com/ggerganov/llama.cpp/tree/master/examples/train-text-from-scratch) example using the same vocab as LLaMA. Later, we can try to utilize better models.

We also assume that batching multiple tokens with the "main" model is significantly faster compared to processing the tokens one-by-one. This may not yet be the case, but it will be when we close https://github.com/ggerganov/ggml/issues/293





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama : add example for speculative sampling #2030

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

llama : add example for speculative sampling #2030

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions