Skip to content

Support for different task types #2

@simonw

Description

@simonw

Currently we don't pass a task_type - the API docs at https://docs.nomic.ai/reference/endpoints/nomic-embed-text say this:

The task your embeddings should be specialized for: search_query, search_document, clustering, classification. Defaults to search_document.

How can we support these? A few options:

  • Do them as a -o option
  • Separate embedding models for each - nomic-embed-text-v1.5-512-clustering etc
  • Teach LLM core about the concept of different task types

The second option is a bad idea. It would result in 4x the number of models, but it's also bad because the point of search types is that you CAN compare search_document with search_query - LLM currently enforces that embeddings can only be compared if they belong to the same model.

The first would work as a short-term fix.

The third idea is most interesting. These are not the only embeddings that differentiate between search and document - it's a really useful concept for implementing RAG. See also E5-large-v2 (thought that one works by including magic prefixes on the strings to be embedded, e.g. "query: question here").

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions