Closed
Description
Prerequisites
- I am running the latest code. Mention the version if possible as well.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
Attempting to python3 convert-hf-to-gguf.py
with NVIDIA's latest NVEmbed model yields a NotImplementedError: Architecture 'NVEmbedModel' not supported!
Add support for NVEmbedModel
architecture.
Motivation
NVIDIA recently released their NVEmbed embeddings model based on the Mistral 7B decoder that ranks #1 on the MTEB leaderboard. It would be nice to see support for this in llama.cpp.
Possible Implementation
I'm not sure how different it would be than existing embeddings architectures. I'm aware other decoder-based models like SFR Embedding Mistral have GGUF quants which work, so I figure the NVEmbed model is structured similarly. Then it's mostly a matter of writing in a new model class for it in convert-hf-to-gguf.py
.