Feature Request: Support for NVEmbed

### Prerequisites

- [X] I am running the latest code. Mention the version if possible as well.
- [X] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md).
- [X] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [X] I reviewed the [Discussions](https://github.com/ggerganov/llama.cpp/discussions), and have a new and useful enhancement to share.

### Feature Description

Attempting to `python3 convert-hf-to-gguf.py` with NVIDIA's latest NVEmbed model yields a `NotImplementedError: Architecture 'NVEmbedModel' not supported!` Add support for `NVEmbedModel` architecture.

### Motivation

NVIDIA recently released their [NVEmbed](https://huggingface.co/nvidia/NV-Embed-v1) embeddings model based on the Mistral 7B decoder that [ranks #1 on the MTEB leaderboard](https://huggingface.co/spaces/mteb/leaderboard). It would be nice to see support for this in llama.cpp.

### Possible Implementation

I'm not sure how different it would be than existing embeddings architectures. I'm aware other decoder-based models like [SFR Embedding Mistral](https://huggingface.co/Salesforce/SFR-Embedding-Mistral) have [GGUF quants](https://huggingface.co/dranger003/SFR-Embedding-Mistral-GGUF) which work, so I figure the NVEmbed model is structured similarly. Then it's mostly a matter of writing in a new model class for it in `convert-hf-to-gguf.py`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Support for NVEmbed #7746

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Support for NVEmbed #7746

Description

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions