Skip to content

Commit 2e77814

Browse files
authored
Update index.md
1 parent e84e566 commit 2e77814

File tree

1 file changed

+47
-162
lines changed

1 file changed

+47
-162
lines changed
Lines changed: 47 additions & 162 deletions
Original file line numberDiff line numberDiff line change
@@ -1,196 +1,81 @@
1-
---
2-
title: 'Databend AI Capabilities'
3-
sidebar_label: 'AI Capabilities'
4-
---
1+
# Databend AI Capabilities
52

6-
This guide invites you to explore the realm where Databend's built-in functions merge with machine learning. Transform your data analysis effortlessly through SQL queries, uncovering a range of natural language tasks— from understanding documents to completing text and more.
3+
This guide introduces Databend's built-in AI functions that enable natural language processing tasks through SQL queries, including text understanding, generation, and more.
74

8-
## Data, Privacy, and Security
5+
:::warning
6+
Data Privacy and Security
97

10-
Databend relies on [Azure OpenAI Service](https://azure.microsoft.com/en-us/products/ai-services/openai-service) for embeddings and text completions, which means your data will be sent to Azure OpenAI Service. Exercise caution when using these functions.
8+
Databend uses Azure OpenAI Service for embeddings and text completions. Your data will be sent to Azure OpenAI when using these functions. These features are available by default on Databend Cloud.
119

12-
These functions are available by default on [Databend Cloud](https://databend.com) using our Azure OpenAI key. **If you use them, you acknowledge that your data will be sent to Azure OpenAI Service**, and you agree to the [Azure OpenAI Data Privacy](https://learn.microsoft.com/en-us/legal/cognitive-services/openai/data-privacy).
10+
**By using these functions, you acknowledge that your data will be sent to Azure OpenAI Service** and agree to the [Azure OpenAI Data Privacy](https://learn.microsoft.com/en-us/legal/cognitive-services/openai/data-privacy) terms.
11+
:::
1312

14-
## What are Embeddings?
15-
16-
Embeddings are vector representations of text data that capture the semantic meaning and context of the original text. They can be used to compare and analyze a text in various natural language processing tasks, such as document similarity, clustering, and recommendation systems.
17-
18-
To illustrate how embeddings work, let's consider a simple example. Suppose we have the following sentences:
19-
1. `"The cat sat on the mat."`
20-
2. `"The dog sat on the rug."`
21-
3. `"The quick brown fox jumped over the lazy dog."`
22-
23-
When creating embeddings for these sentences, the model will convert the text into high-dimensional vectors in such a way that similar sentences are closer together in the vector space.
24-
25-
For instance, the embeddings of sentences 1 and 2 will be closer to each other because they share a similar structure and meaning (both involve an animal sitting on something). On the other hand, the embedding of sentence 3 will be farther from the embeddings of sentences 1 and 2 because it has a different structure and meaning.
26-
27-
The embeddings could look like this (simplified for illustration purposes):
13+
## Key AI Functions
2814

29-
1. `[0.2, 0.3, 0.1, 0.7, 0.4]`
30-
2. `[0.25, 0.29, 0.11, 0.71, 0.38]`
31-
3. `[-0.1, 0.5, 0.6, -0.3, 0.8]`
15+
| Function | Description | When to Use |
16+
|----------|-------------|------------|
17+
| [ai_text_completion](/sql/sql-functions/ai-functions/ai-text-completion) | Generates text based on a prompt | • Content generation<br/>• Question answering<br/>• Summarization<br/>• Text expansion |
18+
| [ai_embedding_vector](/sql/sql-functions/ai-functions/ai-embedding-vector) | Converts text into vector representations | • Semantic search<br/>• Document similarity<br/>• Content recommendation<br/>• Text classification |
19+
| [cosine_distance](/sql/sql-functions/vector-distance-functions/vector-cosine-distance) | Calculates similarity between vectors | • Finding similar documents<br/>• Ranking search results<br/>• Measuring text similarity |
3220

33-
In this simplified example, you can see that the embeddings of sentences 1 and 2 are closer to each other in the vector space, while the embedding of sentence 3 is farther away. This illustrates how embeddings can capture semantic relationships and be used to compare and analyze text data.
3421

35-
## What is a Vector Database?
3622

37-
Typically, embedding vectors are stored in specialized vector databases like milvus, pinecone, qdrant, or weaviate. Databend can also store embedding vectors using the ARRAY(FLOAT32) data type and perform similarity computations with the cosine_distance function in SQL. To create embeddings for a text document using Databend, you can use the built-in `ai_embedding_vector` function directly in your SQL query.
23+
## What are Embeddings?
3824

39-
## Databend AI Functions
25+
Embeddings are vector representations of text that capture semantic meaning. Similar texts have closer vectors in the embedding space, enabling comparison and analysis for tasks like document similarity and clustering.
4026

41-
Databend provides built-in AI functions for various natural language processing tasks. The main functions covered in this document are:
27+
## Vector Storage in Databend
4228

43-
- [ai_embedding_vector](/sql/sql-functions/ai-functions/ai-embedding-vector): Generates embeddings for text documents.
44-
- [ai_text_completion](/sql/sql-functions/ai-functions/ai-text-completion): Generates text completions based on a given prompt.
45-
- [cosine_distance](/sql/sql-functions/vector-distance-functions/vector-cosine-distance): Calculates the cosine distance between two embeddings.
29+
Databend can store embedding vectors using the `ARRAY(FLOAT32 NOT NULL)` data type and perform similarity calculations with the cosine_distance function directly in SQL.
4630

47-
## Generating Embeddings
31+
## Example: Document Similarity Search
4832

49-
Let's create a table to store some sample text documents and their corresponding embeddings:
5033
```sql
34+
-- Create a table for documents
5135
CREATE TABLE articles (
5236
id INT,
5337
title VARCHAR,
5438
content VARCHAR,
55-
embedding ARRAY(FLOAT32)
39+
embedding ARRAY(FLOAT32 NOT NULL)
5640
);
57-
```
5841

59-
Now, let's insert some sample documents into the table:
60-
```sql
42+
-- Insert documents with embeddings
6143
INSERT INTO articles (id, title, content, embedding)
6244
VALUES
63-
(1, 'Python for Data Science', 'Python is a versatile programming language widely used in data science...', ai_embedding_vector('Python is a versatile programming language widely used in data science...')),
64-
(2, 'Introduction to R', 'R is a popular programming language for statistical computing and graphics...', ai_embedding_vector('R is a popular programming language for statistical computing and graphics...')),
65-
(3, 'Getting Started with SQL', 'Structured Query Language (SQL) is a domain-specific language used for managing relational databases...', ai_embedding_vector('Structured Query Language (SQL) is a domain-specific language used for managing relational databases...'));
66-
```
67-
68-
## Calculating Cosine Distance
45+
(1, 'Python for Data Science', 'Python is a versatile programming language...',
46+
ai_embedding_vector('Python is a versatile programming language...')),
47+
(2, 'Introduction to R', 'R is a popular programming language for statistics...',
48+
ai_embedding_vector('R is a popular programming language for statistics...'));
6949

70-
Now, let's find the documents that are most similar to a given query using the [cosine_distance](/sql/sql-functions/vector-distance-functions/vector-cosine-distance) function:
71-
```sql
50+
-- Find similar documents to a query
7251
SELECT
73-
id,
74-
title,
75-
content,
52+
id, title, content,
7653
cosine_distance(embedding, ai_embedding_vector('How to use Python in data analysis?')) AS similarity
77-
FROM
78-
articles
79-
ORDER BY
80-
similarity ASC
81-
LIMIT 3;
82-
```
83-
84-
Result:
85-
```sql
86-
+------+--------------------------+---------------------------------------------------------------------------------------------------------+------------+
87-
| id | title | content | similarity |
88-
+------+--------------------------+---------------------------------------------------------------------------------------------------------+------------+
89-
| 1 | Python for Data Science | Python is a versatile programming language widely used in data science... | 0.1142081 |
90-
| 2 | Introduction to R | R is a popular programming language for statistical computing and graphics... | 0.18741018 |
91-
| 3 | Getting Started with SQL | Structured Query Language (SQL) is a domain-specific language used for managing relational databases... | 0.25137568 |
92-
+------+--------------------------+---------------------------------------------------------------------------------------------------------+------------+
93-
```
94-
95-
## Generating Text Completions
96-
97-
Databend also supports a text completion function, [ai_text_completion](/sql/sql-functions/ai-functions/ai-text-completion).
98-
99-
For example, from the above output, we choose the document with the smallest cosine distance: "Python is a versatile programming language widely used in data science...".
100-
101-
We can use this as context and provide the original question to the [ai_text_completion](/sql/sql-functions/ai-functions/ai-text-completion) function to generate a completion:
102-
103-
```sql
104-
SELECT ai_text_completion('Python is a versatile programming language widely used in data science...') AS completion;
54+
FROM articles
55+
ORDER BY similarity ASC
56+
LIMIT 3;
10557
```
10658

107-
Result:
108-
```sql
109-
110-
completion: and machine learning. It is known for its simplicity, readability, and ease of use. Python has a vast collection of libraries and frameworks that make it easy to perform complex tasks such as data analysis, visualization, and machine learning. Some of the popular libraries used in data science include NumPy, Pandas, Matplotlib, and Scikit-learn. Python is also used in web development, game development, and automation. Its popularity and versatility make it a valuable skill for programmers and data scientists.
111-
```
112-
113-
You can experience these functions on our [Databend Cloud](https://databend.com), where you can sign up for a free trial and start using these AI functions right away.
114-
115-
Databend's AI functions are designed to be easy to use, even for users who are not familiar with machine learning or natural language processing. With Databend, you can quickly and easily add powerful AI capabilities to your SQL queries and take your data analysis to the next level.
116-
117-
## Build an AI Q&A System with Databend
118-
119-
We have utilized [Databend Cloud](https://databend.com) and AI functions to build an AI Q&A system for our documentation.
120-
121-
Here's a step-by-step guide to how it was built:
122-
123-
### Step 1: Create Table
124-
125-
First, create a table with the following structure to store document information and embeddings:
126-
```sql
127-
CREATE TABLE doc (
128-
path VARCHAR,
129-
content VARCHAR,
130-
embedding ARRAY(FLOAT32 NOT NULL)
131-
);
132-
```
133-
134-
### Step 2: Insert Raw Data
59+
## Example: Text Completion
13560

136-
Insert sample data into the table, including the path and content for each document:
13761
```sql
138-
INSERT INTO doc (path, content) VALUES
139-
('ai-function', 'ai_embedding_vector, ai_text_completion, cosine_distance'),
140-
('string-function', 'ASCII, BIN, CHAR_LENGTH');
62+
-- Generate a completion for a prompt
63+
SELECT ai_text_completion('Explain the benefits of cloud data warehouses in three points:') AS completion;
64+
65+
-- Result might be:
66+
-- 1. Scalability: Cloud data warehouses can easily scale up or down based on demand,
67+
-- eliminating the need for upfront capacity planning.
68+
-- 2. Cost-efficiency: Pay-as-you-go pricing models reduce capital expenditure and
69+
-- allow businesses to pay only for the resources they use.
70+
-- 3. Accessibility: Cloud data warehouses enable teams to access data from anywhere,
71+
-- facilitating remote work and global collaboration.
14172
```
14273

143-
### Step 3: Generate Embeddings
144-
145-
Update the table to generate embeddings for the content using the [ai_embedding_vector](/sql/sql-functions/ai-functions/ai-embedding-vector) function:
146-
```sql
147-
UPDATE doc SET embedding = ai_embedding_vector(content)
148-
WHERE LENGTH(embedding) = 0;
149-
```
74+
## Building an AI Q&A System
15075

151-
### Step 4: Ask a Question and Retrieve Relevant Answers
76+
You can create a simple Q&A system with Databend by:
77+
1. Storing documents with embeddings
78+
2. Finding relevant documents for a question
79+
3. Using text completion to generate answers
15280

153-
```sql
154-
-- Define the question as a CTE (Common Table Expression)
155-
WITH question AS (
156-
SELECT 'Tell me the ai functions' AS q
157-
),
158-
-- Calculate the question's embedding vector
159-
question_embedding AS (
160-
SELECT ai_embedding_vector((SELECT q FROM question)) AS q_vector
161-
),
162-
-- Retrieve the top 3 most relevant documents
163-
top_3_docs AS (
164-
SELECT content,
165-
cosine_distance((SELECT q_vector FROM question_embedding), embedding) AS dist
166-
FROM doc
167-
ORDER BY dist ASC
168-
LIMIT 3
169-
),
170-
-- Combine the content of the top 3 documents
171-
combined_content AS (
172-
SELECT string_agg(content, ' ') AS aggregated_content
173-
FROM top_3_docs
174-
),
175-
-- Concatenate a custom prompt, the combined content, and the original question
176-
prompt AS (
177-
SELECT CONCAT(
178-
'Utilizing the sections provided from the Databend documentation, answer the questions to the best of your ability. ',
179-
'Documentation sections: ',
180-
(SELECT aggregated_content FROM combined_content),
181-
' Question: ',
182-
(SELECT q FROM question)
183-
) as p
184-
)
185-
-- Pass the concatenated text to the ai_text_completion function to generate a coherent and relevant response
186-
SELECT ai_text_completion((SELECT p FROM prompt)) AS answer;
187-
```
188-
189-
Result:
190-
```sql
191-
+------------------------------------------------------------------------------------------------------------------+
192-
| answer |
193-
+------------------------------------------------------------------------------------------------------------------+
194-
| Answer: The ai functions mentioned in the Databend documentation are ai_embedding_vector and ai_text_completion. |
195-
+------------------------------------------------------------------------------------------------------------------+
196-
```
81+
Try these AI capabilities on [Databend Cloud](https://databend.com) with a free trial.

0 commit comments

Comments
 (0)