Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDoc-3254 Embeddings Generation via Tasks #1998

Merged
merged 34 commits into from
Mar 27, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
94c4f70
RDoc-3254 The AI Configuration article
Danielle9897 Mar 10, 2025
821fca1
RDoc-3254 The connection strings articles
Danielle9897 Mar 12, 2025
8121a7a
RDoc-3254 The connection strings articles - fixes
Danielle9897 Mar 14, 2025
20ac9a9
RDoc-3254 AI Tasks - list view
Danielle9897 Mar 14, 2025
d37d609
RDoc-3254 Embeddings generation - Overview
Danielle9897 Mar 14, 2025
172805e
RDoc-3254 Fix links
Danielle9897 Mar 14, 2025
de07adf
RDoc-3254 Fix the configuration article
Danielle9897 Mar 14, 2025
8d27c43
RDoc-3254 The embeddings collections article + The embeddings genera…
Danielle9897 Mar 15, 2025
a58cd1e
RDoc-3254 Add embeddings generation task from the Client API
Danielle9897 Mar 17, 2025
4c39991
RDoc-3254 Add the AddEmbeddingsGenerationOperation (C# only)
Danielle9897 Mar 17, 2025
ac7f0b3
RDoc-3254 Explain chunking methods + Add syntax
Danielle9897 Mar 18, 2025
aee2e67
RDoc-3254 fix syntax
Danielle9897 Mar 18, 2025
8671bef
RDoc-3254 fix flow charts
Danielle9897 Mar 18, 2025
9e3e50d
RDoc-3254 Dynamic query example
Danielle9897 Mar 18, 2025
3ceb499
RDoc-3254 Add info to the "RavenDB as a vector DB" article
Danielle9897 Mar 18, 2025
0556f53
RDoc-3254 Static index & query example
Danielle9897 Mar 19, 2025
8ee51aa
RDoc-3254 fix the overview
Danielle9897 Mar 19, 2025
a0cb9c8
RDoc-3254 fix section: Configure the vector field in the Studio
Danielle9897 Mar 19, 2025
0ec3474
RDoc-3254 small fixes
Danielle9897 Mar 20, 2025
d9ea52d
RDoc-3254 Add the option to use `Name: this.Name` in the script
Danielle9897 Mar 20, 2025
ceba35c
RDoc-3254 small fixes
Danielle9897 Mar 20, 2025
46d8c9a
RDoc-3254 improve text
Danielle9897 Mar 20, 2025
036cae0
RDoc-3254 Enhance the process & the cache lookup flow
Danielle9897 Mar 20, 2025
1395741
RDoc-3254 Fix flow charts
Danielle9897 Mar 24, 2025
36af81b
RDoc-3254 In this page => In this article
Danielle9897 Mar 24, 2025
62ef950
RDoc-3254 Fixed all of @Lev comments except one (about other provider…
Danielle9897 Mar 24, 2025
de3bfb8
RDoc-3254 fix flow chart
Danielle9897 Mar 24, 2025
033ba15
RDoc-3254 fix file 'embedding-collections' based on @Arek's comments
Danielle9897 Mar 24, 2025
853b4a8
RDoc-3254 fix other review comments
Danielle9897 Mar 24, 2025
355700b
RDoc-3254 fix the configuration to match the latest code
Danielle9897 Mar 24, 2025
6c3028b
RDoc-3254 Update the embedding-collections article to match the lates…
Danielle9897 Mar 26, 2025
ba68ff7
RDoc-3254 Add the default chunking method used in the script
Danielle9897 Mar 27, 2025
023190e
RDoc-3254 Apply the new naming convention for collections and documen…
Danielle9897 Mar 27, 2025
65e21d5
RDoc-3280 Update information about similarity threshold in vector search
Danielle9897 Mar 27, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,21 @@
"Path": "/vector-search",
"Name": "Vector Search",
"Mappings": []
},
{
"Path": "/generating-embeddings",
"Name": "Generating Embeddings",
"Mappings": []
},
{
"Path": "/connection-strings",
"Name": "Connection Strings",
"Mappings": []
},
{
"Path": "ai-tasks-list-view.markdown",
"Name": "AI Tasks - List View",
"DiscussionId": "6f2679ae-0244-496b-aaff-06e3682bc55d",
"Mappings": []
}
]
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# AI Tasks - List View
---

{NOTE: }

* In this view, you can manage RavenDB's AI tasks -
create new tasks, edit existing ones, or delete them as needed.

* AI tasks are part of RavenDB's ongoing tasks. Learn more in [Ongoing Tasks - Overview](../studio/database/tasks/ongoing-tasks/general-info)

* Currently, the only supported AI task type is [embeddings generation](../ai-integration/generating-embeddings/overview).

---

* In this article:
* [AI Tasks - list view](../ai-integration/ai-tasks-list-view#ai-tasks---list-view)

{NOTE/}

---

{PANEL: AI Tasks - list view}

![AI tasks list view](images/ai-tasks-list-view.png "AI Tasks - list view")

1. Go to **AI Hub > AI Tasks**.

2. **Add AI Task**: Click to create a new AI task.

3. **Task name**: This is the name of the task.

4. **Identifier**: The string identifier defined for the task.
**Connection string**: The name of the connection string defined in the task.

5. **Task status**: Displays the task's state and progress.

6. **Assigned node**: The node in the database group responsible for the task.

7. **Enable/Disable**: Toggle the task on or off.

8. **Details**: Click to view the detailed information about the task.

9. **Edit**: Click to modify the task.

10. **Delete**: Click to remove the task.

{PANEL/}

## Related Articles

### Vector Search

- [RavenDB as a vector database](../ai-integration/vector-search/ravendb-as-vector-database)
- [Vector search using a static index](../ai-integration/vector-search/vector-search-using-static-index)
- [Vector search using a dynamic query](../ai-integration/vector-search/vector-search-using-dynamic-query)

### Embeddings Generation

- [Generating embeddings - overview](../ai-integration/generating-embeddings/overview)
- [Embeddings generation task](../ai-integration/generating-embeddings/embeddings-generation-task)

### Connection Strings

- [Connection strings - overview](../ai-integration/connection-strings/connection-strings-overview)
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
[
{
"Path": "connection-strings-overview.markdown",
"Name": "Overview",
"DiscussionId": "d50a19e4-5447-4b36-91ea-997c79d58178",
"Mappings": []
},
{
"Path": "azure-open-ai.markdown",
"Name": "Azure OpenAI",
"DiscussionId": "b1a120bb-8f0a-42b3-9338-2a6f656517e5",
"Mappings": []
},
{
"Path": "google-ai.markdown",
"Name": "Google AI",
"DiscussionId": "143c8438-d2a4-44f9-a5a3-c7f1def06962",
"Mappings": []
},
{
"Path": "hugging-face.markdown",
"Name": "Hugging Face",
"DiscussionId": "9a709299-d444-43ad-8024-1fb78205b80c",
"Mappings": []
},
{
"Path": "ollama.markdown",
"Name": "Ollama",
"DiscussionId": "560d30e8-accf-4c67-aedf-757df4c150d0",
"Mappings": []
},
{
"Path": "open-ai.markdown",
"Name": "OpenAI",
"DiscussionId": "9c4e61fe-d427-4c0c-96c0-03d09e486b6a",
"Mappings": []
},
{
"Path": "mistral-ai.markdown",
"Name": "Mistral AI",
"DiscussionId": "d8b83393-92f9-42a9-9e30-fdecc1fb60b4",
"Mappings": []
},
{
"Path": "embedded.markdown",
"Name": "bge-micro-v2 (Embedded)",
"DiscussionId": "5a9733f5-1184-41a1-86fc-0049f3ec46ac",
"Mappings": []
}
]
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# Connection String to Azure OpenAI
---

{NOTE: }

* This article explains how to define a connection string to the [Azure OpenAI Service](https://azure.microsoft.com/en-us/products/ai-services/openai-service),
enabling RavenDB to seamlessly integrate its [embeddings generation tasks](../../ai-integration/generating-embeddings/overview) with your Azure environment.

* In this article:
* [Define the connection string - from the Studio](../../ai-integration/connection-strings/azure-open-ai#define-the-connection-string---from-the-studio)
* [Define the connection string - from the Client API](../../ai-integration/connection-strings/azure-open-ai#define-the-connection-string---from-the-client-api)
* [Syntax](../../ai-integration/connection-strings/azure-open-ai#syntax)

{NOTE/}

---

{PANEL: Define the connection string - from the Studio}

![connection string to azure open ai](images/azure-open-ai.png "Define a connection string to Azure OpenAI")

1. **Name**
Enter a name for this connection string.

2. **Identifier** (optional)
Enter an identifier for this connection string.
Learn more about the identifier in the [connection string identifier](../../ai-integration/connection-strings/connection-strings-overview#the-connection-string-identifier) section.

3. **Connector**
Select **Azure OpenAI** from the dropdown menu.

4. **API Key**
Enter the API key used to authenticate requests to the Azure OpenAI service.

5. **Endpoint**
Enter the Azure OpenAI endpoint URL for generating embeddings from text.

6. **Model**
Specify the Azure OpenAI text embedding model to use.

7. **Deployment Name**
Specify the unique identifier assigned to your model deployment in your Azure environment.

8. **Dimensions** (optional)
* Specify the number of dimensions for the output embeddings.
Supported only by _text-embedding-3_ and later models.
* If not specified, the model's default dimensionality is used.

9. Click **Test Connection** to confirm the connection string is set up correctly.

10. Click **Save** to store the connection string or **Cancel** to discard changes.

{PANEL/}

{PANEL: Define the connection string - from the Client API}

{CODE:csharp create_connection_string_azure_open_ai@AiIntegration\ConnectionStrings\connectionStrings.cs /}

{PANEL/}

{PANEL: Syntax}

{CODE:csharp azure_open_ai_settings@AiIntegration\ConnectionStrings\connectionStrings.cs /}

{PANEL/}

## Related Articles

### Vector Search

- [RavenDB as a vector database](../../ai-integration/vector-search/ravendb-as-vector-database)
- [Vector search using a static index](../../ai-integration/vector-search/vector-search-using-static-index)
- [Vector search using a dynamic query](../../ai-integration/vector-search/vector-search-using-dynamic-query)

### Embeddings Generation

- [Generating embeddings - overview](../../ai-integration/generating-embeddings/overview)
- [Embeddings generation task](../../ai-integration/generating-embeddings/embeddings-generation-task)

### AI Connection Strings

- [Connection strings - overview](../../ai-integration/connection-strings/connection-strings-overview)
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
# AI Connection Strings - Overview
---

{NOTE: }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect to see in the NOTE section of the Connection Strings page only a brief indication of the types of Tasks in which this type of connection string can be applied. It is worth noting that in the future we will have other types of Tasks and we will use the same type of connection string—an AI Connection String.

Perhaps at this stage we can leave it as is and make this page less "Embeddings-Generation-Task-specific" only after another type of task appears.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed w/ @ArieSLV:

  • For now,
    we'll keep the docs aligned with the current feature (be "Embeddings-Generation-Task-specific")
    as it's best to provide exact definitions of what we offer, without being vague.
  • When the time comes,
    I will adapt and make all necessary changes in the connection string sections
    and in ALL other relevant places.
    That definitely won't go unnoticed - and will be meticulously done.


* In RavenDB, you can define [Embeddings Generation Tasks](../../ai-integration/generating-embeddings/overview) that generate embeddings from the content of your documents.
These embeddings are stored in a dedicated collection within the database and enable vector search on your document content.

* Each embeddings generation task must define a **connection string** to an embedding provider.
This connection string specifies where the embeddings will be generated,
allowing RavenDB to integrate with external services such as Azure OpenAI, OpenAI, Hugging Face, Google AI, Ollama, Mistral AI, or RavenDB's embedded model (bge-micro-v2).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to add something like "as well as any providers that have an OpenAI-compatible API" here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, I would avoid this. When it works, it's fine.


* While each task can have only one connection string, you can define multiple connection strings in your database to support different providers or configurations.
A single connection string can also be reused across multiple tasks in the database.

* These connection strings can be created from:
* The **AI Connection Strings view** in the Studio, where you can create, edit, and delete connection strings that are not in use.
* The **Client API** - examples are available in the dedicated articles for each provider.

---

* In this article:
* [The AI Connection Strings view](../../ai-integration/connection-strings/connection-strings-overview#the-ai-connection-strings-view)
* [Creating an AI connection string](../../ai-integration/connection-strings/connection-strings-overview#creating-an-ai-connection-string)

{NOTE/}

---

{PANEL: The AI Connection Strings view}

![connection strings view](images/connection-strings-view.png "The AI Connection Strings view")

1. Go to the **AI Hub** menu.

2. Open the **AI Connection Strings** view.

3. Click **"Add new"** to create a new connection string.

4. View the list of all AI connection strings.

5. Edit or delete a connection string.
Only connection strings that are not in use by a task can be deleted.

{PANEL/}

{PANEL: Creating an AI connection string}

![create connection string](images/create-connection-string.png "Create connection string")

1. **Name**
Enter a unique name for the connection string.

2. **Identifier**
Enter a unique identifier for the connection string.
Each AI connection string in the database must have a distinct identifier.

If not specified, or when clicking the "Regenerate" button,
RavenDB automatically generates the identifier based on the connection string name. For example:
* If the connection string name is: _"My connection string to Google AI"_
* The generated identifier will be: _"my-connection-string-to-google-ai"_

Allowed characters: only lowercase letters (a-z), numbers (0-9), and hyphens (-).
See how this identifier is used in the [embeddings cache collection](../../ai-integration/generating-embeddings/embedding-collections#the-embeddings-cache-collection).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also worth linking other possible places where the task identifier is used:

  • indexing - LoadDocument("FieldName", "my-connection-string-to-google-ai")
  • dynamic queries - from Employees where vector.search(..., ai.task("my-connection-string-to-google-ai", ...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically, in this section where we explain the connection string identifier,
I wouldn't put links to these methods since they use the "task identifier" and not the "connection string identifier"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. Please ignore :)


3. **Regenerate**
Click "Regenerate" to automatically create an identifier based on the connection string name.

4. **Connector**
Select an AI provider from the dropdown menu.
This will open a popup where you can configure the connection details.
Configuration details for each provider are explained in the following articles:
* [Azure Open AI](../../ai-integration/connection-strings/azure-open-ai)
* [Google AI](../../ai-integration/connection-strings/google-ai)
* [Hugging Face](../../ai-integration/connection-strings/hugging-face)
* [Ollama](../../ai-integration/connection-strings/ollama)
* [OpenAI](../../ai-integration/connection-strings/open-ai)
* [Mistral AI](../../ai-integration/connection-strings/mistral-ai)
* [Embedded model (bge-micro-v2)](../../ai-integration/connection-strings/embedded)

5. Once you complete all configurations for the selected provider in the popup view,
save the connection string definition.

{PANEL/}

## Related Articles

### Vector Search

- [RavenDB as a vector database](../../ai-integration/vector-search/ravendb-as-vector-database)
- [Vector search using a static index](../../ai-integration/vector-search/vector-search-using-static-index)
- [Vector search using a dynamic query](../../ai-integration/vector-search/vector-search-using-dynamic-query)

### Embeddings Generation

- [Generating embeddings - overview](../../ai-integration/generating-embeddings/overview)
- [Embeddings generation task](../../ai-integration/generating-embeddings/embeddings-generation-task)

### AI Connection Strings

- [Azure Open AI](../../ai-integration/connection-strings/azure-open-ai)
- [Google AI](../../ai-integration/connection-strings/google-ai)
- [Hugging Face](../../ai-integration/connection-strings/hugging-face)
- [Ollama](../../ai-integration/connection-strings/ollama)
- [OpenAI](../../ai-integration/connection-strings/open-ai)
- [Mistral AI](../../ai-integration/connection-strings/mistral-ai)
- [Embedded model](../../ai-integration/connection-strings/embedded)
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Connection String to bge-micro-v2 (Embedded)
---

{NOTE: }

* This article explains how to define a connection string to the [bge-micro-v2](https://huggingface.co/TaylorAI/bge-micro-v2) model.
This model, designed exclusively for embeddings generation, is embedded within RavenDB, enabling RavenDB to seamlessly handle its
[embeddings generation tasks](../../ai-integration/generating-embeddings/overview) without requiring an external AI service.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth adding that:

  1. This embedded model is designed exclusively for embeddings generation.
  2. It's important to understand that running this model locally will directly impact RavenDB's performance, as the process of calculating embeddings will consume processor time. We are very mindful of RavenDB's performance and would like to avoid any misunderstandings in this regard :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added


* Running the model locally consumes processor resources and will impact RavenDB's overall performance,
depending on your workload and usage patterns.

* In this article:
* [Define the connection string - from the Studio](../../ai-integration/connection-strings/embedded#define-the-connection-string---from-the-studio)
* [Define the connection string - from the Client API](../../ai-integration/connection-strings/embedded#define-the-connection-string---from-the-client-api)
* [Syntax](../../ai-integration/connection-strings/embedded#syntax)

{NOTE/}

---

{PANEL: Define the connection string - from the Studio}

![connection string to the embedded model](images/embedded.png "Define a connection string to the embedded model")

1. **Name**
Enter a name for this connection string.

2. **Identifier** (optional)
Learn more about the identifier in the [connection string identifier](../../ai-integration/connection-strings/connection-strings-overview#the-connection-string-identifier) section.

3. **Connector**
Select **Embedded (bge-micro-v2)** from the dropdown menu.

4. Click **Save** to store the connection string or **Cancel** to discard changes.

{PANEL/}

{PANEL: Define the connection string - from the Client API}

{CODE:csharp create_connection_string_embedded@AiIntegration\ConnectionStrings\connectionStrings.cs /}

{PANEL/}

{PANEL: Syntax}

{CODE:csharp embedded_settings@AiIntegration\ConnectionStrings\connectionStrings.cs /}

{PANEL/}

## Related Articles

### Vector Search

- [RavenDB as a vector database](../../ai-integration/vector-search/ravendb-as-vector-database)
- [Vector search using a static index](../../ai-integration/vector-search/vector-search-using-static-index)
- [Vector search using a dynamic query](../../ai-integration/vector-search/vector-search-using-dynamic-query)

### Embeddings Generation

- [Generating embeddings - overview](../../ai-integration/generating-embeddings/overview)
- [Embeddings generation task](../../ai-integration/generating-embeddings/embeddings-generation-task)

### AI Connection Strings

- [Connection strings - overview](../../ai-integration/connection-strings/connection-strings-overview)
Loading
Loading