Skip to content

tidbcloud: Add auto embedding docs #21499

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: release-8.5
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 17 additions & 8 deletions TOC-tidb-cloud.md
Original file line number Diff line number Diff line change
Expand Up @@ -216,8 +216,8 @@
- [TiKV Follower Read](/follower-read.md)
- [Coprocessor Cache](/coprocessor-cache.md)
- Garbage Collection (GC)
- [Overview](/garbage-collection-overview.md)
- [Configuration](/garbage-collection-configuration.md)
- [Overview](/garbage-collection-overview.md)
- [Configuration](/garbage-collection-configuration.md)
- [Tune TiFlash Performance](/tiflash/tune-tiflash-performance.md)
- Optimize Resource Allocation
- Resource Manager
Expand Down Expand Up @@ -258,6 +258,15 @@
- Get Started
- [Get Started with SQL](/vector-search/vector-search-get-started-using-sql.md)
- [Get Started with Python](/vector-search/vector-search-get-started-using-python.md)
- Auto Embedding
- [Overview](/tidb-cloud/vector-search-auto-embedding-overview.md)
- [Amazon Titan Embeddings](/tidb-cloud/vector-search-auto-embedding-amazon-titan.md)
- [Cohere Embeddings](/tidb-cloud/vector-search-auto-embedding-cohere.md)
- [Jina AI Embeddings](/tidb-cloud/vector-search-auto-embedding-jina-ai.md)
- [OpenAI Embeddings](/tidb-cloud/vector-search-auto-embedding-openai.md)
- [Gemini Embeddings](/tidb-cloud/vector-search-auto-embedding-gemini.md)
- [HuggingFace Embeddings](/tidb-cloud/vector-search-auto-embedding-huggingface.md)
- [NVIDIA NIM Embeddings](/tidb-cloud/vector-search-auto-embedding-nvidia-nim.md)
- Integrations
- [Overview](/vector-search/vector-search-integration-overview.md)
- AI Frameworks
Expand Down Expand Up @@ -326,8 +335,8 @@
- [Connect via VPC Peering](/tidb-cloud/set-up-vpc-peering-connections.md)
- [TLS Connections to TiDB Cloud Dedicated](/tidb-cloud/tidb-cloud-tls-connect-to-dedicated.md)
- Data Access Control
- [Encryption at Rest Using Customer-Managed Encryption Keys](/tidb-cloud/tidb-cloud-encrypt-cmek.md)
- [User-Controlled Log Redaction](/tidb-cloud/tidb-cloud-log-redaction.md)
- [Encryption at Rest Using Customer-Managed Encryption Keys](/tidb-cloud/tidb-cloud-encrypt-cmek.md)
- [User-Controlled Log Redaction](/tidb-cloud/tidb-cloud-log-redaction.md)
- Database Access Control
- [Configure Cluster Password Settings](/tidb-cloud/configure-security-settings.md)
- Audit Management
Expand Down Expand Up @@ -673,8 +682,8 @@
- [Use UUIDs](/best-practices/uuid.md)
- [TiDB Accelerated Table Creation](/accelerated-table-creation.md)
- [Schema Cache](/schema-cache.md)
- API Reference ![BETA](/media/tidb-cloud/blank_transparent_placeholder.png)
- [Overview](/tidb-cloud/api-overview.md)
- API Reference ![BETA](/media/tidb-cloud/blank_transparent_placeholder.png)
- [Overview](/tidb-cloud/api-overview.md)
- v1beta1
- [TiDB Cloud Starter and Essential](https://docs.pingcap.com/tidbcloud/api/v1beta1/serverless)
- [TiDB Cloud Dedicated](https://docs.pingcap.com/tidbcloud/api/v1beta1/dedicated)
Expand All @@ -692,8 +701,8 @@
- [TSO](/tso.md)
- Storage Engines
- TiKV
- [TiKV Overview](/tikv-overview.md)
- [RocksDB Overview](/storage-engine/rocksdb-overview.md)
- [TiKV Overview](/tikv-overview.md)
- [RocksDB Overview](/storage-engine/rocksdb-overview.md)
- TiFlash
- [TiFlash Overview](/tiflash/tiflash-overview.md)
- [Spill to Disk](/tiflash/tiflash-spill-disk.md)
Expand Down
139 changes: 139 additions & 0 deletions tidb-cloud/vector-search-auto-embedding-amazon-titan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
---
title: Amazon Titan Embeddings
summary: Learn how to use Amazon Titan embedding models in TiDB Cloud.
aliases: ["/tidb/stable/vector-search-auto-embedding-amazon-titan"]
---

# Amazon Titan Embeddings

## Available Models

TiDB Cloud provides the following [Amazon Titan embedding model](https://docs.aws.amazon.com/bedrock/latest/userguide/titan-embedding-models.html) natively. No API key required.

**Amazon Titan Text Embedding V2 model**

- Name: `tidbcloud_free/amazon/titan-embed-text-v2`
- Dimensions: 1024 (default), 512, 256
- Distance Metric: Cosine / L2
- Languages – English (100+ languages in preview)
- Supported use cases – RAG, document search, reranking, classification, etc.
- Max input text tokens: 8,192
- Max input text characters: 50,000
- Price: Free
- Hosted by TiDB Cloud: ✅
- Bring Your Own Key: ❌

For more details, see [its official documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/titan-embedding-models.html).

## Availability

This feature is currently available in these regions and offerings:

- Starter: AWS Frankfurt (eu-central-1)
- Starter: AWS Oregon (us-west-2)
- Starter: AWS N. Virginia (us-east-1)

## SQL Usage Example

```sql
CREATE TABLE sample (
`id` INT,
`content` TEXT,
`embedding` VECTOR(1024) GENERATED ALWAYS AS (EMBED_TEXT(
"tidbcloud_free/amazon/titan-embed-text-v2",
`content`
)) STORED
);


INSERT INTO sample
(`id`, `content`)
VALUES
(1, "Java: Object-oriented language for cross-platform development."),
(2, "Java coffee: Bold Indonesian beans with low acidity."),
(3, "Java island: Densely populated, home to Jakarta."),
(4, "Java's syntax is used in Android apps."),
(5, "Dark roast Java beans enhance espresso blends.");


SELECT `id`, `content` FROM sample
ORDER BY
VEC_EMBED_COSINE_DISTANCE(
embedding,
"How to start learning Java programming?"
)
LIMIT 2;
```

Result:

```
+------+----------------------------------------------------------------+
| id | content |
+------+----------------------------------------------------------------+
| 1 | Java: Object-oriented language for cross-platform development. |
| 4 | Java's syntax is used in Android apps. |
+------+----------------------------------------------------------------+
```

## Options

You can specify additional options via the `additional_json_options` parameter of the `EMBED_TEXT()` function.

- `normalize` – (optional) Flag indicating whether or not to normalize the output embedding. Defaults to true.
- `dimensions` – (optional) The number of dimensions the output embedding should have. The following values are accepted: 1024 (default), 512, 256.

**Example: Use alternative dimensions via `dimensions`**

```sql
CREATE TABLE sample (
`id` INT,
`content` TEXT,
`embedding` VECTOR(512) GENERATED ALWAYS AS (EMBED_TEXT(
"tidbcloud_free/amazon/titan-embed-text-v2",
`content`,
'{"dimensions": 512}'
)) STORED
);


INSERT INTO sample
(`id`, `content`)
VALUES
(1, "Java: Object-oriented language for cross-platform development."),
(2, "Java coffee: Bold Indonesian beans with low acidity."),
(3, "Java island: Densely populated, home to Jakarta."),
(4, "Java's syntax is used in Android apps."),
(5, "Dark roast Java beans enhance espresso blends.");


SELECT `id`, `content` FROM sample
ORDER BY
VEC_EMBED_COSINE_DISTANCE(
embedding,
"How to start learning Java programming?"
)
LIMIT 2;
```

Result:

```
+------+----------------------------------------------------------------+
| id | content |
+------+----------------------------------------------------------------+
| 1 | Java: Object-oriented language for cross-platform development. |
| 4 | Java's syntax is used in Android apps. |
+------+----------------------------------------------------------------+
```

## Python Usage Example

See [PyTiDB Documentation](https://pingcap.github.io/ai/guides/auto-embedding/).

## See Also

- [Auto Embedding Overview](/tidb-cloud/vector-search-auto-embedding-overview.md)
- [Vector Search](/vector-search/vector-search-overview.md)
- [Vector Functions and Operators](/vector-search/vector-search-functions-and-operators.md)
- [Hybrid Search](/tidb-cloud/vector-search-hybrid-search.md)
Loading
Loading