Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add L40S to Managed Inference models #4378

Merged
merged 11 commits into from
Feb 11, 2025
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ dates:
| Attribute | Details |
|-----------------|------------------------------------|
| Provider | [baai](https://huggingface.co/BAAI) |
| Compatible Instances | L4 (FP32) |
| Compatible Instances | L4, L40S (FP32) |
| Context size | 4096 tokens |

## Model name
Expand All @@ -32,6 +32,7 @@ baai/bge-multilingual-gemma2:fp32
| Instance type | Max context length |
| ------------- |-------------|
| L4 | 4096 (FP32) |
| L40S | 4096 (FP32) |

## Model introduction

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ categories:
|-----------------|------------------------------------|
| Provider | [Deepseek](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) |
| License | [MIT](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md) |
| Compatible Instances | L4, H100 (BF16) |
| Compatible Instances | L4, L40S, H100 (BF16) |
| Context Length | up to 131k tokens |

## Model names
Expand All @@ -33,6 +33,7 @@ deepseek/deepseek-r1-distill-llama-8b:bf16
| Instance type | Max context length |
| ------------- |-------------|
| L4 | 39k (BF16) |
| L40S | 131k (BF16) |
| H100 | 131k (BF16) |

## Model introduction
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ categories:
| Attribute | Details |
|-----------------|------------------------------------|
| Provider | [Meta](https://llama.meta.com/llama3/) |
| Compatible Instances | H100 (FP8) |
| Compatible Instances | H100, H100-2 (FP8) |
| Context size | 8192 tokens |

## Model names
Expand All @@ -30,6 +30,7 @@ meta/llama-3-70b-instruct:fp8
## Compatible Instances

- [H100 (FP8)](https://www.scaleway.com/en/h100-pcie-try-it-now/)
- H100-2 (FP8)

## Model introduction

Expand Down Expand Up @@ -82,4 +83,4 @@ Process the output data according to your application's needs. The response will

<Message type="note">
Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently.
</Message>
</Message>
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ categories:
| Attribute | Details |
|-----------------|------------------------------------|
| Provider | [Meta](https://llama.meta.com/llama3/) |
| Compatible Instances | L4, H100 (FP8, BF16) |
| Compatible Instances | L4, L40S, H100, H100-2 (FP8, BF16) |
| Context size | 8192 tokens |

## Model names
Expand All @@ -33,7 +33,9 @@ meta/llama-3-8b-instruct:fp8
| Instance type | Max context length |
| ------------- |-------------|
| L4 | 8192 (FP8, BF16) |
| H100 | 8192 (FP8, BF16)
| L40S | 8192 (FP8, BF16) |
| H100 | 8192 (FP8, BF16) |
| H100-2 | 8192 (FP8, BF16) |

## Model introduction

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ categories:
|-----------------|------------------------------------|
| Provider | [Meta](https://llama.meta.com/llama3/) |
| License | [Llama 3.1 community](https://llama.meta.com/llama3_1/license/) |
| Compatible Instances | L4, H100, H100-2 (FP8, BF16) |
| Compatible Instances | L4, L40S, H100, H100-2 (FP8, BF16) |
| Context Length | up to 128k tokens |

## Model names
Expand All @@ -34,8 +34,9 @@ meta/llama-3.1-8b-instruct:bf16
| Instance type | Max context length |
| ------------- |-------------|
| L4 | 96k (FP8), 27k (BF16) |
| H100 | 128k (FP8, BF16)
| H100-2 | 128k (FP8, BF16)
| L40S | 128k (FP8, BF16) |
| H100 | 128k (FP8, BF16) |
| H100-2 | 128k (FP8, BF16) |

## Model introduction

Expand Down Expand Up @@ -82,4 +83,4 @@ Process the output data according to your application's needs. The response will

<Message type="note">
Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently.
</Message>
</Message>
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@ categories:

| Attribute | Details |
|-----------------|------------------------------------|
| Provider | [Mistral](https://mistral.ai/technology/#models) |
| Compatible Instances | L4 (BF16) |
| Provider | [Mistral](https://mistral.ai/technology/#models) |
| Compatible Instances | L4, L40S, H100, H100-2 (BF16) |
| Context size | 32K tokens |

## Model name
Expand All @@ -31,7 +31,10 @@ mistral/mistral-7b-instruct-v0.3:bf16

| Instance type | Max context length |
| ------------- |-------------|
| L4 | 32k (BF16)
| L4 | 32k (BF16) |
| L40S | 32k (BF16) |
| H100 | 32k (BF16) |
| H100-2 | 32k (BF16) |

## Model introduction

Expand Down Expand Up @@ -75,4 +78,4 @@ Process the output data according to your application's needs. The response will

<Message type="note">
Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently.
</Message>
</Message>
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,9 @@ categories:

| Attribute | Details |
|-----------------|------------------------------------|
| Provider | [Mistral](https://mistral.ai/technology/#models) |
| Compatible Instances | H100 (FP8) |
| Context size | 128K tokens |
| Provider | [Mistral](https://mistral.ai/technology/#models) |
| Compatible Instances | L40S, H100, H100-2 (FP8) |
| Context size | 128K tokens |

## Model name

Expand All @@ -31,7 +31,9 @@ mistral/mistral-nemo-instruct-2407:fp8

| Instance type | Max context length |
| ------------- |-------------|
| H100 | 128k (FP8)
| L40 | 128k (FP8) |
| H100 | 128k (FP8) |
| H100-2 | 128k (FP8) |

## Model introduction

Expand Down Expand Up @@ -81,4 +83,4 @@ Process the output data according to your application's needs. The response will

<Message type="note">
Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently.
</Message>
</Message>
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,9 @@ categories:

| Attribute | Details |
|-----------------|------------------------------------|
| Provider | [Mistral](https://mistral.ai/technology/#models) |
| Compatible Instances | H100, H100-2 (bf16) |
| Context size | 128k tokens |
| Provider | [Mistral](https://mistral.ai/technology/#models) |
| Compatible Instances | L40S, H100, H100-2 (bf16) |
| Context size | 128k tokens |

## Model name

Expand All @@ -31,6 +31,7 @@ mistral/pixtral-12b-2409:bf16

| Instance type | Max context length |
| ------------- |-------------|
| L40S | 50k (BF16)
| H100 | 128k (BF16)
| H100-2 | 128k (BF16)

Expand Down Expand Up @@ -162,4 +163,4 @@ Only bitmaps can be analyzed by Pixtral, PDFs and videos are not supported.
The only limitation is in context window (1 token for each 16x16 pixel).

#### What is the maximum amount of images per conversation?
One conversation can handle up to 12 images (per request). The 13rd will return a 413 error.
One conversation can handle up to 12 images (per request). The 13rd will return a 413 error.