Supporting Llama-2 70B param

Hello, 
First of all, I want to thank you for this amazing crate! It is truly a joy to work on LLM using Rust 😄 .

I recently wrote an [API that serves Llama-2](https://github.com/AmineDiro/cria) models using this crate. 

I have an issue for serving Llama2-70B-GGML model. The 65B llama and 70B Llama-2 models use grouped query attention. This is done in `llama.cpp` by specifying the `n_gqa` params in model hyperparameters which feels a little bit hacky 🤔  

 I would love to work on adding support for the `n_gqa` on this crate, I think that it can be added to the Llama model: 
```rust
/// LLaMA [hyperparameters](https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning))
#[derive(Debug, Default, PartialEq, Eq, Clone, Copy)]
pub struct Hyperparameters {
  ....
   pub n_head_kv: usize,
}
````
Where the n_head_kv is   `n_head_kv = n_head / n_gqa;` and `n_head % n_gqa == 0` and pass the `n_gqa` parameter in `ModelParameters` as Optional 🤔 

Thank you for you help ! 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Supporting Llama-2 70B param #402

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Supporting Llama-2 70B param #402

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions