Skip to content

Feature Request: Support for NVEmbed #7746

Closed
@christianazinn

Description

@christianazinn

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Attempting to python3 convert-hf-to-gguf.py with NVIDIA's latest NVEmbed model yields a NotImplementedError: Architecture 'NVEmbedModel' not supported! Add support for NVEmbedModel architecture.

Motivation

NVIDIA recently released their NVEmbed embeddings model based on the Mistral 7B decoder that ranks #1 on the MTEB leaderboard. It would be nice to see support for this in llama.cpp.

Possible Implementation

I'm not sure how different it would be than existing embeddings architectures. I'm aware other decoder-based models like SFR Embedding Mistral have GGUF quants which work, so I figure the NVEmbed model is structured similarly. Then it's mostly a matter of writing in a new model class for it in convert-hf-to-gguf.py.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions