-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RDoc-3254 Embeddings Generation via Tasks #1998
Changes from all commits
94c4f70
821fca1
8121a7a
20ac9a9
d37d609
172805e
de07adf
8d27c43
a58cd1e
4c39991
ac7f0b3
aee2e67
8671bef
9e3e50d
3ceb499
0556f53
8ee51aa
a0cb9c8
0ec3474
d9ea52d
ceba35c
46d8c9a
036cae0
1395741
36af81b
62ef950
de3bfb8
033ba15
853b4a8
355700b
6c3028b
ba68ff7
023190e
65e21d5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,64 @@ | ||
# AI Tasks - List View | ||
--- | ||
|
||
{NOTE: } | ||
|
||
* In this view, you can manage RavenDB's AI tasks - | ||
create new tasks, edit existing ones, or delete them as needed. | ||
|
||
* AI tasks are part of RavenDB's ongoing tasks. Learn more in [Ongoing Tasks - Overview](../studio/database/tasks/ongoing-tasks/general-info) | ||
|
||
* Currently, the only supported AI task type is [embeddings generation](../ai-integration/generating-embeddings/overview). | ||
|
||
--- | ||
|
||
* In this article: | ||
* [AI Tasks - list view](../ai-integration/ai-tasks-list-view#ai-tasks---list-view) | ||
|
||
{NOTE/} | ||
|
||
--- | ||
|
||
{PANEL: AI Tasks - list view} | ||
|
||
 | ||
|
||
1. Go to **AI Hub > AI Tasks**. | ||
|
||
2. **Add AI Task**: Click to create a new AI task. | ||
|
||
3. **Task name**: This is the name of the task. | ||
|
||
4. **Identifier**: The string identifier defined for the task. | ||
**Connection string**: The name of the connection string defined in the task. | ||
|
||
5. **Task status**: Displays the task's state and progress. | ||
|
||
6. **Assigned node**: The node in the database group responsible for the task. | ||
|
||
7. **Enable/Disable**: Toggle the task on or off. | ||
|
||
8. **Details**: Click to view the detailed information about the task. | ||
|
||
9. **Edit**: Click to modify the task. | ||
|
||
10. **Delete**: Click to remove the task. | ||
|
||
{PANEL/} | ||
|
||
## Related Articles | ||
|
||
### Vector Search | ||
|
||
- [RavenDB as a vector database](../ai-integration/vector-search/ravendb-as-vector-database) | ||
- [Vector search using a static index](../ai-integration/vector-search/vector-search-using-static-index) | ||
- [Vector search using a dynamic query](../ai-integration/vector-search/vector-search-using-dynamic-query) | ||
|
||
### Embeddings Generation | ||
|
||
- [Generating embeddings - overview](../ai-integration/generating-embeddings/overview) | ||
- [Embeddings generation task](../ai-integration/generating-embeddings/embeddings-generation-task) | ||
|
||
### Connection Strings | ||
|
||
- [Connection strings - overview](../ai-integration/connection-strings/connection-strings-overview) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
[ | ||
{ | ||
"Path": "connection-strings-overview.markdown", | ||
"Name": "Overview", | ||
"DiscussionId": "d50a19e4-5447-4b36-91ea-997c79d58178", | ||
"Mappings": [] | ||
}, | ||
{ | ||
"Path": "azure-open-ai.markdown", | ||
"Name": "Azure OpenAI", | ||
"DiscussionId": "b1a120bb-8f0a-42b3-9338-2a6f656517e5", | ||
"Mappings": [] | ||
}, | ||
{ | ||
"Path": "google-ai.markdown", | ||
"Name": "Google AI", | ||
"DiscussionId": "143c8438-d2a4-44f9-a5a3-c7f1def06962", | ||
"Mappings": [] | ||
}, | ||
{ | ||
"Path": "hugging-face.markdown", | ||
"Name": "Hugging Face", | ||
"DiscussionId": "9a709299-d444-43ad-8024-1fb78205b80c", | ||
"Mappings": [] | ||
}, | ||
{ | ||
"Path": "ollama.markdown", | ||
"Name": "Ollama", | ||
"DiscussionId": "560d30e8-accf-4c67-aedf-757df4c150d0", | ||
"Mappings": [] | ||
}, | ||
{ | ||
"Path": "open-ai.markdown", | ||
"Name": "OpenAI", | ||
"DiscussionId": "9c4e61fe-d427-4c0c-96c0-03d09e486b6a", | ||
"Mappings": [] | ||
}, | ||
{ | ||
"Path": "mistral-ai.markdown", | ||
"Name": "Mistral AI", | ||
"DiscussionId": "d8b83393-92f9-42a9-9e30-fdecc1fb60b4", | ||
"Mappings": [] | ||
}, | ||
{ | ||
"Path": "embedded.markdown", | ||
"Name": "bge-micro-v2 (Embedded)", | ||
"DiscussionId": "5a9733f5-1184-41a1-86fc-0049f3ec46ac", | ||
"Mappings": [] | ||
} | ||
] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
# Connection String to Azure OpenAI | ||
--- | ||
|
||
{NOTE: } | ||
|
||
* This article explains how to define a connection string to the [Azure OpenAI Service](https://azure.microsoft.com/en-us/products/ai-services/openai-service), | ||
enabling RavenDB to seamlessly integrate its [embeddings generation tasks](../../ai-integration/generating-embeddings/overview) with your Azure environment. | ||
|
||
* In this article: | ||
* [Define the connection string - from the Studio](../../ai-integration/connection-strings/azure-open-ai#define-the-connection-string---from-the-studio) | ||
* [Define the connection string - from the Client API](../../ai-integration/connection-strings/azure-open-ai#define-the-connection-string---from-the-client-api) | ||
* [Syntax](../../ai-integration/connection-strings/azure-open-ai#syntax) | ||
|
||
{NOTE/} | ||
|
||
--- | ||
|
||
{PANEL: Define the connection string - from the Studio} | ||
|
||
 | ||
|
||
1. **Name** | ||
Enter a name for this connection string. | ||
|
||
2. **Identifier** (optional) | ||
Enter an identifier for this connection string. | ||
Learn more about the identifier in the [connection string identifier](../../ai-integration/connection-strings/connection-strings-overview#the-connection-string-identifier) section. | ||
|
||
3. **Connector** | ||
Select **Azure OpenAI** from the dropdown menu. | ||
|
||
4. **API Key** | ||
Enter the API key used to authenticate requests to the Azure OpenAI service. | ||
|
||
5. **Endpoint** | ||
Enter the Azure OpenAI endpoint URL for generating embeddings from text. | ||
|
||
6. **Model** | ||
Specify the Azure OpenAI text embedding model to use. | ||
|
||
7. **Deployment Name** | ||
Specify the unique identifier assigned to your model deployment in your Azure environment. | ||
|
||
8. **Dimensions** (optional) | ||
* Specify the number of dimensions for the output embeddings. | ||
Supported only by _text-embedding-3_ and later models. | ||
* If not specified, the model's default dimensionality is used. | ||
|
||
9. Click **Test Connection** to confirm the connection string is set up correctly. | ||
|
||
10. Click **Save** to store the connection string or **Cancel** to discard changes. | ||
|
||
{PANEL/} | ||
|
||
{PANEL: Define the connection string - from the Client API} | ||
|
||
{CODE:csharp create_connection_string_azure_open_ai@AiIntegration\ConnectionStrings\connectionStrings.cs /} | ||
|
||
{PANEL/} | ||
|
||
{PANEL: Syntax} | ||
|
||
{CODE:csharp azure_open_ai_settings@AiIntegration\ConnectionStrings\connectionStrings.cs /} | ||
|
||
{PANEL/} | ||
|
||
## Related Articles | ||
|
||
### Vector Search | ||
|
||
- [RavenDB as a vector database](../../ai-integration/vector-search/ravendb-as-vector-database) | ||
- [Vector search using a static index](../../ai-integration/vector-search/vector-search-using-static-index) | ||
- [Vector search using a dynamic query](../../ai-integration/vector-search/vector-search-using-dynamic-query) | ||
|
||
### Embeddings Generation | ||
|
||
- [Generating embeddings - overview](../../ai-integration/generating-embeddings/overview) | ||
- [Embeddings generation task](../../ai-integration/generating-embeddings/embeddings-generation-task) | ||
|
||
### AI Connection Strings | ||
|
||
- [Connection strings - overview](../../ai-integration/connection-strings/connection-strings-overview) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,107 @@ | ||
# AI Connection Strings - Overview | ||
--- | ||
|
||
{NOTE: } | ||
|
||
* In RavenDB, you can define [Embeddings Generation Tasks](../../ai-integration/generating-embeddings/overview) that generate embeddings from the content of your documents. | ||
These embeddings are stored in a dedicated collection within the database and enable vector search on your document content. | ||
|
||
* Each embeddings generation task must define a **connection string** to an embedding provider. | ||
This connection string specifies where the embeddings will be generated, | ||
allowing RavenDB to integrate with external services such as Azure OpenAI, OpenAI, Hugging Face, Google AI, Ollama, Mistral AI, or RavenDB's embedded model (bge-micro-v2). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does it make sense to add something like "as well as any providers that have an OpenAI-compatible API" here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IMHO, I would avoid this. When it works, it's fine. |
||
|
||
* While each task can have only one connection string, you can define multiple connection strings in your database to support different providers or configurations. | ||
A single connection string can also be reused across multiple tasks in the database. | ||
|
||
* These connection strings can be created from: | ||
* The **AI Connection Strings view** in the Studio, where you can create, edit, and delete connection strings that are not in use. | ||
* The **Client API** - examples are available in the dedicated articles for each provider. | ||
|
||
--- | ||
|
||
* In this article: | ||
* [The AI Connection Strings view](../../ai-integration/connection-strings/connection-strings-overview#the-ai-connection-strings-view) | ||
* [Creating an AI connection string](../../ai-integration/connection-strings/connection-strings-overview#creating-an-ai-connection-string) | ||
|
||
{NOTE/} | ||
|
||
--- | ||
|
||
{PANEL: The AI Connection Strings view} | ||
|
||
 | ||
|
||
1. Go to the **AI Hub** menu. | ||
|
||
2. Open the **AI Connection Strings** view. | ||
|
||
3. Click **"Add new"** to create a new connection string. | ||
|
||
4. View the list of all AI connection strings. | ||
|
||
5. Edit or delete a connection string. | ||
Only connection strings that are not in use by a task can be deleted. | ||
|
||
{PANEL/} | ||
|
||
{PANEL: Creating an AI connection string} | ||
|
||
 | ||
|
||
1. **Name** | ||
Enter a unique name for the connection string. | ||
|
||
2. **Identifier** | ||
Enter a unique identifier for the connection string. | ||
Each AI connection string in the database must have a distinct identifier. | ||
|
||
If not specified, or when clicking the "Regenerate" button, | ||
RavenDB automatically generates the identifier based on the connection string name. For example: | ||
* If the connection string name is: _"My connection string to Google AI"_ | ||
* The generated identifier will be: _"my-connection-string-to-google-ai"_ | ||
|
||
Allowed characters: only lowercase letters (a-z), numbers (0-9), and hyphens (-). | ||
See how this identifier is used in the [embeddings cache collection](../../ai-integration/generating-embeddings/embedding-collections#the-embeddings-cache-collection). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe also worth linking other possible places where the task identifier is used:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Specifically, in this section where we explain the connection string identifier, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You're right. Please ignore :) |
||
|
||
3. **Regenerate** | ||
Click "Regenerate" to automatically create an identifier based on the connection string name. | ||
|
||
4. **Connector** | ||
Select an AI provider from the dropdown menu. | ||
This will open a popup where you can configure the connection details. | ||
Configuration details for each provider are explained in the following articles: | ||
* [Azure Open AI](../../ai-integration/connection-strings/azure-open-ai) | ||
* [Google AI](../../ai-integration/connection-strings/google-ai) | ||
* [Hugging Face](../../ai-integration/connection-strings/hugging-face) | ||
* [Ollama](../../ai-integration/connection-strings/ollama) | ||
* [OpenAI](../../ai-integration/connection-strings/open-ai) | ||
* [Mistral AI](../../ai-integration/connection-strings/mistral-ai) | ||
* [Embedded model (bge-micro-v2)](../../ai-integration/connection-strings/embedded) | ||
|
||
5. Once you complete all configurations for the selected provider in the popup view, | ||
save the connection string definition. | ||
|
||
{PANEL/} | ||
|
||
## Related Articles | ||
|
||
### Vector Search | ||
|
||
- [RavenDB as a vector database](../../ai-integration/vector-search/ravendb-as-vector-database) | ||
- [Vector search using a static index](../../ai-integration/vector-search/vector-search-using-static-index) | ||
- [Vector search using a dynamic query](../../ai-integration/vector-search/vector-search-using-dynamic-query) | ||
|
||
### Embeddings Generation | ||
|
||
- [Generating embeddings - overview](../../ai-integration/generating-embeddings/overview) | ||
- [Embeddings generation task](../../ai-integration/generating-embeddings/embeddings-generation-task) | ||
|
||
### AI Connection Strings | ||
|
||
- [Azure Open AI](../../ai-integration/connection-strings/azure-open-ai) | ||
- [Google AI](../../ai-integration/connection-strings/google-ai) | ||
- [Hugging Face](../../ai-integration/connection-strings/hugging-face) | ||
- [Ollama](../../ai-integration/connection-strings/ollama) | ||
- [OpenAI](../../ai-integration/connection-strings/open-ai) | ||
- [Mistral AI](../../ai-integration/connection-strings/mistral-ai) | ||
- [Embedded model](../../ai-integration/connection-strings/embedded) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
# Connection String to bge-micro-v2 (Embedded) | ||
--- | ||
|
||
{NOTE: } | ||
|
||
* This article explains how to define a connection string to the [bge-micro-v2](https://huggingface.co/TaylorAI/bge-micro-v2) model. | ||
This model, designed exclusively for embeddings generation, is embedded within RavenDB, enabling RavenDB to seamlessly handle its | ||
[embeddings generation tasks](../../ai-integration/generating-embeddings/overview) without requiring an external AI service. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's worth adding that:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. added |
||
|
||
* Running the model locally consumes processor resources and will impact RavenDB's overall performance, | ||
depending on your workload and usage patterns. | ||
|
||
* In this article: | ||
* [Define the connection string - from the Studio](../../ai-integration/connection-strings/embedded#define-the-connection-string---from-the-studio) | ||
* [Define the connection string - from the Client API](../../ai-integration/connection-strings/embedded#define-the-connection-string---from-the-client-api) | ||
* [Syntax](../../ai-integration/connection-strings/embedded#syntax) | ||
|
||
{NOTE/} | ||
|
||
--- | ||
|
||
{PANEL: Define the connection string - from the Studio} | ||
|
||
 | ||
|
||
1. **Name** | ||
Enter a name for this connection string. | ||
|
||
2. **Identifier** (optional) | ||
Learn more about the identifier in the [connection string identifier](../../ai-integration/connection-strings/connection-strings-overview#the-connection-string-identifier) section. | ||
|
||
3. **Connector** | ||
Select **Embedded (bge-micro-v2)** from the dropdown menu. | ||
|
||
4. Click **Save** to store the connection string or **Cancel** to discard changes. | ||
|
||
{PANEL/} | ||
|
||
{PANEL: Define the connection string - from the Client API} | ||
|
||
{CODE:csharp create_connection_string_embedded@AiIntegration\ConnectionStrings\connectionStrings.cs /} | ||
|
||
{PANEL/} | ||
|
||
{PANEL: Syntax} | ||
|
||
{CODE:csharp embedded_settings@AiIntegration\ConnectionStrings\connectionStrings.cs /} | ||
|
||
{PANEL/} | ||
|
||
## Related Articles | ||
|
||
### Vector Search | ||
|
||
- [RavenDB as a vector database](../../ai-integration/vector-search/ravendb-as-vector-database) | ||
- [Vector search using a static index](../../ai-integration/vector-search/vector-search-using-static-index) | ||
- [Vector search using a dynamic query](../../ai-integration/vector-search/vector-search-using-dynamic-query) | ||
|
||
### Embeddings Generation | ||
|
||
- [Generating embeddings - overview](../../ai-integration/generating-embeddings/overview) | ||
- [Embeddings generation task](../../ai-integration/generating-embeddings/embeddings-generation-task) | ||
|
||
### AI Connection Strings | ||
|
||
- [Connection strings - overview](../../ai-integration/connection-strings/connection-strings-overview) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect to see in the
NOTE
section of the Connection Strings page only a brief indication of the types of Tasks in which this type of connection string can be applied. It is worth noting that in the future we will have other types of Tasks and we will use the same type of connection string—an AI Connection String.Perhaps at this stage we can leave it as is and make this page less "Embeddings-Generation-Task-specific" only after another type of task appears.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed w/ @ArieSLV:
we'll keep the docs aligned with the current feature (be "Embeddings-Generation-Task-specific")
as it's best to provide exact definitions of what we offer, without being vague.
I will adapt and make all necessary changes in the connection string sections
and in ALL other relevant places.
That definitely won't go unnoticed - and will be meticulously done.