-
-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Currently we don't pass a task_type
- the API docs at https://docs.nomic.ai/reference/endpoints/nomic-embed-text say this:
The task your embeddings should be specialized for:
search_query
,search_document
,clustering
,classification
. Defaults tosearch_document
.
How can we support these? A few options:
- Do them as a
-o
option - Separate embedding models for each -
nomic-embed-text-v1.5-512-clustering
etc - Teach LLM core about the concept of different task types
The second option is a bad idea. It would result in 4x the number of models, but it's also bad because the point of search types is that you CAN compare search_document
with search_query
- LLM currently enforces that embeddings can only be compared if they belong to the same model.
The first would work as a short-term fix.
The third idea is most interesting. These are not the only embeddings that differentiate between search and document - it's a really useful concept for implementing RAG. See also E5-large-v2 (thought that one works by including magic prefixes on the strings to be embedded, e.g. "query: question here"
).