Skip to content

Commit

Permalink
Add L40S to Managed Inference models (#4378)
Browse files Browse the repository at this point in the history
* Add L40S to Managed Inference models

* Update llama-3-70b-instruct.mdx

* Update llama-3.1-8b-instruct.mdx

* Update deepseek-r1-distill-llama-8b.mdx

* Update pixtral-12b-2409.mdx

* Update mistral-7b-instruct-v0.3.mdx

* Update mistral-nemo-instruct-2407.mdx

* Update pixtral-12b-2409.mdx

* Update llama-3.1-8b-instruct.mdx

* Update deepseek-r1-distill-llama-8b.mdx

* Update bge-multilingual-gemma2.mdx
  • Loading branch information
fpagny authored Feb 11, 2025
1 parent 42312ed commit f110780
Show file tree
Hide file tree
Showing 8 changed files with 35 additions and 23 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ dates:
| Attribute | Details |
|-----------------|------------------------------------|
| Provider | [baai](https://huggingface.co/BAAI) |
| Compatible Instances | L4 (FP32) |
| Compatible Instances | L4, L40S (FP32) |
| Context size | 4096 tokens |

## Model name
Expand All @@ -32,6 +32,7 @@ baai/bge-multilingual-gemma2:fp32
| Instance type | Max context length |
| ------------- |-------------|
| L4 | 4096 (FP32) |
| L40S | 4096 (FP32) |

## Model introduction

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ categories:
|-----------------|------------------------------------|
| Provider | [Deepseek](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) |
| License | [MIT](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md) |
| Compatible Instances | L4, H100 (BF16) |
| Compatible Instances | L4, L40S, H100 (BF16) |
| Context Length | up to 131k tokens |

## Model names
Expand All @@ -33,6 +33,7 @@ deepseek/deepseek-r1-distill-llama-8b:bf16
| Instance type | Max context length |
| ------------- |-------------|
| L4 | 39k (BF16) |
| L40S | 131k (BF16) |
| H100 | 131k (BF16) |

## Model introduction
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ categories:
| Attribute | Details |
|-----------------|------------------------------------|
| Provider | [Meta](https://llama.meta.com/llama3/) |
| Compatible Instances | H100 (FP8) |
| Compatible Instances | H100, H100-2 (FP8) |
| Context size | 8192 tokens |

## Model names
Expand All @@ -30,6 +30,7 @@ meta/llama-3-70b-instruct:fp8
## Compatible Instances

- [H100 (FP8)](https://www.scaleway.com/en/h100-pcie-try-it-now/)
- H100-2 (FP8)

## Model introduction

Expand Down Expand Up @@ -82,4 +83,4 @@ Process the output data according to your application's needs. The response will

<Message type="note">
Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently.
</Message>
</Message>
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ categories:
| Attribute | Details |
|-----------------|------------------------------------|
| Provider | [Meta](https://llama.meta.com/llama3/) |
| Compatible Instances | L4, H100 (FP8, BF16) |
| Compatible Instances | L4, L40S, H100, H100-2 (FP8, BF16) |
| Context size | 8192 tokens |

## Model names
Expand All @@ -33,7 +33,9 @@ meta/llama-3-8b-instruct:fp8
| Instance type | Max context length |
| ------------- |-------------|
| L4 | 8192 (FP8, BF16) |
| H100 | 8192 (FP8, BF16)
| L40S | 8192 (FP8, BF16) |
| H100 | 8192 (FP8, BF16) |
| H100-2 | 8192 (FP8, BF16) |

## Model introduction

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ categories:
|-----------------|------------------------------------|
| Provider | [Meta](https://llama.meta.com/llama3/) |
| License | [Llama 3.1 community](https://llama.meta.com/llama3_1/license/) |
| Compatible Instances | L4, H100, H100-2 (FP8, BF16) |
| Compatible Instances | L4, L40S, H100, H100-2 (FP8, BF16) |
| Context Length | up to 128k tokens |

## Model names
Expand All @@ -34,8 +34,9 @@ meta/llama-3.1-8b-instruct:bf16
| Instance type | Max context length |
| ------------- |-------------|
| L4 | 96k (FP8), 27k (BF16) |
| H100 | 128k (FP8, BF16)
| H100-2 | 128k (FP8, BF16)
| L40S | 128k (FP8, BF16) |
| H100 | 128k (FP8, BF16) |
| H100-2 | 128k (FP8, BF16) |

## Model introduction

Expand Down Expand Up @@ -82,4 +83,4 @@ Process the output data according to your application's needs. The response will

<Message type="note">
Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently.
</Message>
</Message>
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@ categories:

| Attribute | Details |
|-----------------|------------------------------------|
| Provider | [Mistral](https://mistral.ai/technology/#models) |
| Compatible Instances | L4 (BF16) |
| Provider | [Mistral](https://mistral.ai/technology/#models) |
| Compatible Instances | L4, L40S, H100, H100-2 (BF16) |
| Context size | 32K tokens |

## Model name
Expand All @@ -31,7 +31,10 @@ mistral/mistral-7b-instruct-v0.3:bf16

| Instance type | Max context length |
| ------------- |-------------|
| L4 | 32k (BF16)
| L4 | 32k (BF16) |
| L40S | 32k (BF16) |
| H100 | 32k (BF16) |
| H100-2 | 32k (BF16) |

## Model introduction

Expand Down Expand Up @@ -75,4 +78,4 @@ Process the output data according to your application's needs. The response will

<Message type="note">
Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently.
</Message>
</Message>
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,9 @@ categories:

| Attribute | Details |
|-----------------|------------------------------------|
| Provider | [Mistral](https://mistral.ai/technology/#models) |
| Compatible Instances | H100 (FP8) |
| Context size | 128K tokens |
| Provider | [Mistral](https://mistral.ai/technology/#models) |
| Compatible Instances | L40S, H100, H100-2 (FP8) |
| Context size | 128K tokens |

## Model name

Expand All @@ -31,7 +31,9 @@ mistral/mistral-nemo-instruct-2407:fp8

| Instance type | Max context length |
| ------------- |-------------|
| H100 | 128k (FP8)
| L40 | 128k (FP8) |
| H100 | 128k (FP8) |
| H100-2 | 128k (FP8) |

## Model introduction

Expand Down Expand Up @@ -81,4 +83,4 @@ Process the output data according to your application's needs. The response will

<Message type="note">
Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently.
</Message>
</Message>
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,9 @@ categories:

| Attribute | Details |
|-----------------|------------------------------------|
| Provider | [Mistral](https://mistral.ai/technology/#models) |
| Compatible Instances | H100, H100-2 (bf16) |
| Context size | 128k tokens |
| Provider | [Mistral](https://mistral.ai/technology/#models) |
| Compatible Instances | L40S, H100, H100-2 (bf16) |
| Context size | 128k tokens |

## Model name

Expand All @@ -31,6 +31,7 @@ mistral/pixtral-12b-2409:bf16

| Instance type | Max context length |
| ------------- |-------------|
| L40S | 50k (BF16)
| H100 | 128k (BF16)
| H100-2 | 128k (BF16)

Expand Down Expand Up @@ -162,4 +163,4 @@ Only bitmaps can be analyzed by Pixtral, PDFs and videos are not supported.
The only limitation is in context window (1 token for each 16x16 pixel).

#### What is the maximum amount of images per conversation?
One conversation can handle up to 12 images (per request). The 13rd will return a 413 error.
One conversation can handle up to 12 images (per request). The 13rd will return a 413 error.

0 comments on commit f110780

Please sign in to comment.