Skip to content
This repository was archived by the owner on May 10, 2024. It is now read-only.

proposed format for standardized EF pages #168

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
96 changes: 60 additions & 36 deletions docs/embeddings/cohere.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,56 +3,88 @@

# Cohere

import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
Chroma also provides a convenient wrapper around Cohere's embedding API. This embedding function runs remotely on Cohere’s servers, and requires an API key. You can get an API key by signing up for an account at [Cohere](https://dashboard.cohere.ai/welcome/register).

<div class="select-language">Select a language</div>
<div class="data_table"></div>

<Tabs queryString groupId="lang">
<TabItem value="py" label="Python"></TabItem>
<TabItem value="js" label="JavaScript"></TabItem>
</Tabs>
| Models | Input | Dimensionality | Context Size|
|--|--|--|--|--|
|`embed-english-v3.0` | English | 1024 | 512 (recommended) |
|`embed-multilingual-v3.0` | [Full List](https://docs.cohere.com/docs/supported-languages) | 1024 | 512 (recommended) |
|`embed-english-light-v3.0` | English | 384 | 512 (recommended) |
|`embed-multilingual-light-v3.0` | [Full List](https://docs.cohere.com/docs/supported-languages) | 384 | 512 (recommended) |
|`embed-english-v2.0` | English | 4096 | 512 (recommended) |
|`embed-english-light-v2.0` | English | 1024 | 512 (recommended) |
|`embed-multilingual-v2.0` | [Full List](https://docs.cohere.com/docs/supported-languages) | 768 | 512 (recommended) |

Chroma also provides a convenient wrapper around Cohere's embedding API. This embedding function runs remotely on Cohere’s servers, and requires an API key. You can get an API key by signing up for an account at [Cohere](https://dashboard.cohere.ai/welcome/register).

<Tabs queryString groupId="lang" className="hideTabSwitcher">
<TabItem value="py" label="Python">
## Basic Usage

This embedding function relies on the `cohere` python package, which you can install with `pip install cohere`.
### Python

```bash
pip install cohere
```

```python
cohere_ef = embedding_functions.CohereEmbeddingFunction(api_key="YOUR_API_KEY", model_name="large")
cohere_ef(texts=["document1","document2"])

from chromadb.utils import embedding_functions

embedder = embedding_functions.CohereEmbeddingFunction(
api_key="YOUR_API_KEY")

collection = client.create_collection(
name="cohere_ef",
embedding_function=embedder)
```

</TabItem>
<TabItem value="js" label="JavaScript">
### Javascript

```bash
yarn add cohere-ai
```

```javascript
const {CohereEmbeddingFunction} = require('chromadb');
const embedder = new CohereEmbeddingFunction("apiKey")
import { ChromaClient, CohereEmbeddingFunction } from 'chromadb'

// use directly
const embeddings = embedder.generate(["document1","document2"])
const embedder = new CohereEmbeddingFunction({
apiKey: "YOUR_API_KEY"
})

// pass documents to query for .add and .query
const collection = await client.createCollection({name: "name", embeddingFunction: embedder})
const collectionGet = await client.getCollection({name:"name", embeddingFunction: embedder})
const collection = await client.createCollection({
name: "cohere_ef",
embeddingFunction: embedder
})
```

</TabItem>
## Advanced Usage

</Tabs>
### Call directly

By passing the embedding function to a Collection, Chroma handles the embedding of documents and queries for you. However in some cases you may want to generate the embeddings outside and handle them yourself.

#### Python

```python
embeddings = embedder(["document1","document2"])
# [[0.04565250128507614, 0.01611952856183052...], [0.030171213671565056, 0.007690359838306904...]]
```

#### Javascript

```javascript
const embeddings = embedder.generate(["document1","document2"])
// [[0.04565250128507614, 0.01611952856183052...], [0.030171213671565056, 0.007690359838306904...]]
```

### Using a different model

You can pass in an optional `model_name` argument, which lets you choose which Cohere embeddings model to use. By default, Chroma uses `large` model. You can see the available models under `Get embeddings` section [here](https://docs.cohere.ai/reference/embed).


### Multilingual model example

<Tabs queryString groupId="lang" className="hideTabSwitcher">
<TabItem value="py" label="Python">
#### Python

```python
cohere_ef = embedding_functions.CohereEmbeddingFunction(
Expand All @@ -69,11 +101,10 @@ cohere_ef(texts=multilingual_texts)

```

</TabItem>
<TabItem value="js" label="JavaScript">
#### Javascript

```javascript
const {CohereEmbeddingFunction} = require('chromadb');
import { CohereEmbeddingFunction } from 'chromadb'
const embedder = new CohereEmbeddingFunction("apiKey")

multilingual_texts = [ 'Hello from Cohere!', 'مرحبًا من كوهير!',
Expand All @@ -86,11 +117,4 @@ const embeddings = embedder.generate(multilingual_texts)

```


</TabItem>

</Tabs>



For more information on multilingual model you can read [here](https://docs.cohere.ai/docs/multilingual-language-models).
99 changes: 63 additions & 36 deletions docs/embeddings/openai.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,59 +3,86 @@

# OpenAI

import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
Chroma provides a convenient wrapper around OpenAI's embedding API. This embedding function runs remotely on OpenAI's servers, and requires an API key. You can get an API key by signing up for an account at [OpenAI](https://openai.com/api/).

<div class="select-language">Select a language</div>
<div class="data_table"></div>

<Tabs queryString groupId="lang">
<TabItem value="py" label="Python"></TabItem>
<TabItem value="js" label="JavaScript"></TabItem>
</Tabs>
| Models | Input | Dimensionality | Context Size|
|--|--|--|--|--|
|`ada-002` | English | 1536 | 2048 |

Chroma provides a convenient wrapper around OpenAI's embedding API. This embedding function runs remotely on OpenAI's servers, and requires an API key. You can get an API key by signing up for an account at [OpenAI](https://openai.com/api/).
## Basic Usage

<Tabs queryString groupId="lang" className="hideTabSwitcher">
<TabItem value="py" label="Python">
### Python

This embedding function relies on the `openai` python package, which you can install with `pip install openai`.
```bash
pip install openai
```

```python
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
api_key="YOUR_API_KEY",
model_name="text-embedding-ada-002"
)

from chromadb.utils import embedding_functions

embedder = embedding_functions.OpenAIEmbeddingFunction(
api_key="YOUR_API_KEY")

collection = client.create_collection(
name="oai_ef",
embedding_function=embedder)
```

To use the OpenAI embedding models on other platforms such as Azure, you can use the `api_base` and `api_type` parameters:
### Javascript

```bash
yarn add openai
```

```javascript
import { ChromaClient, OpenAIEmbeddingFunction } from 'chromadb'

const embedder = new OpenAIEmbeddingFunction({
openai_api_key: "YOUR_API_KEY"
})

const collection = await client.createCollection({
name: "oai_ef",
embeddingFunction: embedder
})
```

## Advanced Usage

### Call directly

By passing the embedding function to a Collection, Chroma handles the embedding of documents and queries for you. However in some cases you may want to generate the embeddings outside and handle them yourself.

#### Python

```python
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
api_key="YOUR_API_KEY",
api_base="YOUR_API_BASE_PATH",
api_type="azure",
api_version="YOUR_API_VERSION",
model_name="text-embedding-ada-002"
)
embeddings = embedder(["document1","document2"])
# [[0.04565250128507614, 0.01611952856183052...], [0.030171213671565056, 0.007690359838306904...]]
```

</TabItem>
<TabItem value="js" label="JavaScript">
#### Javascript

```javascript
const {OpenAIEmbeddingFunction} = require('chromadb');
const embedder = new OpenAIEmbeddingFunction({openai_api_key: "apiKey"})

// use directly
const embeddings = embedder.generate(["document1","document2"])

// pass documents to query for .add and .query
const collection = await client.createCollection({name: "name", embeddingFunction: embedder})
const collection = await client.getCollection({name: "name", embeddingFunction: embedder})
// [[0.04565250128507614, 0.01611952856183052...], [0.030171213671565056, 0.007690359838306904...]]
```

</TabItem>
### Using a different model

</Tabs>
You can pass in an optional `model_name` argument, which lets you choose which OpenAI embeddings model to use. By default, Chroma uses `text-embedding-ada-002`. You can see a list of all available models [here](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings).

### Run with Azure

You can pass in an optional `model_name` argument, which lets you choose which OpenAI embeddings model to use. By default, Chroma uses `text-embedding-ada-002`. You can see a list of all available models [here](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings).
To use the OpenAI embedding models on other platforms such as Azure, you can use the `api_base` and `api_type` parameters:
```python
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
api_key="YOUR_API_KEY",
api_base="YOUR_API_BASE_PATH",
api_type="azure",
api_version="YOUR_API_VERSION",
model_name="text-embedding-ada-002"
)
```
52 changes: 46 additions & 6 deletions src/css/custom.css
Original file line number Diff line number Diff line change
Expand Up @@ -406,7 +406,10 @@ article p a {
font-size: 1.5rem !important;
}
h3 {
font-size: 1rem !important;
font-size: 1.2rem !important;
}
h4 {
font-size: 0.9rem !important;
}
article {
max-width: 700px;
Expand Down Expand Up @@ -483,12 +486,8 @@ article p a {

div.special_table + table {
border: none;

/* border-collapse: separate; */
/* border-spacing: 0px; */
}


div.special_table + table thead {
background: rgba(120,120,120, 0.1);
border-top-right-radius: 10px;
Expand Down Expand Up @@ -519,6 +518,8 @@ div.special_table + table, th, td {
border-width: 0px !important;
}



.custom-tag {
display: inline;
background-color: #f0f0f0;
Expand Down Expand Up @@ -588,4 +589,43 @@ div.special_table + table, th, td {

.main-wrapper {
min-height: 100vh;
}
}


div.data_table + table {
border: none;
padding-top: 20px;
padding-bottom: 20px;
zoom: 0.8;
}

div.data_table + table thead {
background: rgba(120,120,120, 0.1);
border-top-right-radius: 10px;
overflow: hidden;
}

div.data_table + table thead tr {
background: rgba(255, 255, 255, 0.1);
border-top: 0px;
border-bottom: 0px;
text-align: left;
}
div.data_table + table tr th {
background: rgba(255, 255, 255, 0);
color: #000;
font-weight: 600;
padding: 5px 20px;
}
div.data_table + table tr td {
padding: 5px 20px;
text-align: left;
}

div.data_table + table tr:nth-child(even) {
background: rgba(255, 255, 255, 0);
}

div.data_table + table, th, td {
border-width: 0px !important;
}