Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update function-calling-support.mdx #4372

Merged
merged 12 commits into from
Feb 6, 2025
8 changes: 8 additions & 0 deletions menu/navigation.json
Original file line number Diff line number Diff line change
Expand Up @@ -817,6 +817,14 @@
"label": "Llama-3.3-70b-instruct model",
"slug": "llama-3.3-70b-instruct"
},
{
"label": "DeepSeek-R1-Distill-Llama-70B model",
"slug": "deepseek-r1-distill-llama-70b"
},
{
"label": "DeepSeek-R1-Distill-Llama-8B model",
"slug": "deepseek-r1-distill-llama-8b"
},
{
"label": "Mistral-7b-instruct-v0.3 model",
"slug": "mistral-7b-instruct-v0.3"
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
---
meta:
title: Understanding the DeepSeek-R1-Distill-Llama-70B model
description: Deploy your own secure DeepSeek-R1-Distill-Llama-70B model with Scaleway Managed Inference. Privacy-focused, fully managed.
content:
h1: Understanding the DeepSeek-R1-Distill-Llama-70B model
paragraph: This page provides information on the DeepSeek-R1-Distill-Llama-70B model
tags:
dates:
validation: 2025-02-06
posted: 2025-02-06
categories:
- ai-data
---

## Model overview

| Attribute | Details |
|-----------------|------------------------------------|
| Provider | [Deepseek](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B) |
| License | [MIT](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md) |
| Compatible Instances | H100-2 (BF16) |
| Context Length | up to 56k tokens |

## Model names

```bash
deepseek/deepseek-r1-distill-llama-70b:bf16
```

## Compatible Instances

| Instance type | Max context length |
| ------------- |-------------|
| H100-2 | 56k (BF16) |

## Model introduction

Released January 21, 2025, Deepseek’s R1 Distilled Llama 70B is a distilled version of the Llama model family based on Deepseek R1.
DeepSeek R1 Distill Llama 70B is designed to improve the performance of Llama models on reasoning use case such as mathematics and coding tasks.

## Why is it useful?

It is great to see Deepseek improving open(weight) models, and we are excited to fully support their mission with integration in the Scaleway ecosystem.

- DeepSeek-R1-Distill-Llama was optimized to reach accuracy close to Deepseek-R1 in tasks like mathematics and coding, while keeping inference costs limited and tokens speed efficient.
- DeepSeek-R1-Distill-Llama supports a context window of up to 56K tokens and tool calling, keeping interaction with other components possible.

## How to use it

### Sending Managed Inference requests

To perform inference tasks with your DeepSeek R1 Distill Llama deployed at Scaleway, use the following command:

```bash
curl -s \
-H "Authorization: Bearer <IAM API key>" \
-H "Content-Type: application/json" \
--request POST \
--url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/chat/completions" \
--data '{"model":"deepseek/deepseek-r1-distill-llama-70b:fp8", "messages":[{"role": "user","content": "There is a llama in my garden, what should I do?"}], "max_tokens": 500, "temperature": 0.7, "stream": false}'
```

Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.

<Message type="note">
Ensure that the `messages` array is properly formatted with roles (user, assistant) and content.
</Message>

<Message type="tip">
This model is better used without `system prompt`, as suggested by the model provider.
</Message>

### Receiving inference responses

Upon sending the HTTP request to the public or private endpoints exposed by the server, you will receive inference responses from the Managed Inference server.
Process the output data according to your application's needs. The response will contain the output generated by the LLM model based on the input provided in the request.

<Message type="note">
Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently.
</Message>
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
---
meta:
title: Understanding the DeepSeek-R1-Distill-Llama-8B model
description: Deploy your own secure DeepSeek-R1-Distill-Llama-8B model with Scaleway Managed Inference. Privacy-focused, fully managed.
content:
h1: Understanding the DeepSeek-R1-Distill-Llama-8B model
paragraph: This page provides information on the DeepSeek-R1-Distill-Llama-8B model
tags:
dates:
validation: 2025-02-06
posted: 2025-02-06
categories:
- ai-data
---

## Model overview

| Attribute | Details |
|-----------------|------------------------------------|
| Provider | [Deepseek](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) |
| License | [MIT](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md) |
| Compatible Instances | L4, H100 (BF16) |
| Context Length | up to 131k tokens |

## Model names

```bash
deepseek/deepseek-r1-distill-llama-8b:bf16
```

## Compatible Instances

| Instance type | Max context length |
| ------------- |-------------|
| L4 | 39k (BF16) |
| H100 | 131k (BF16) |

## Model introduction

Released January 21, 2025, Deepseek’s R1 Distilled Llama 8B is a distilled version of the Llama model family based on Deepseek R1.
DeepSeek R1 Distill Llama 8B is designed to improve the performance of Llama models on reasoning use cases such as mathematics and coding tasks.

## Why is it useful?

It is great to see Deepseek improving open(weight) models, and we are excited to fully support their mission with integration in the Scaleway ecosystem.

- DeepSeek-R1-Distill-Llama was optimized to reach accuracy close to Deepseek-R1 in tasks like mathematics and coding, while keeping inference costs limited and tokens speed efficient.
- DeepSeek-R1-Distill-Llama supports a context window of up to 131K tokens and tool calling, keeping interaction with other components possible.

## How to use it

### Sending Managed Inference requests

To perform inference tasks with your DeepSeek R1 Distill Llama deployed at Scaleway, use the following command:

```bash
curl -s \
-H "Authorization: Bearer <IAM API key>" \
-H "Content-Type: application/json" \
--request POST \
--url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/chat/completions" \
--data '{"model":"deepseek/deepseek-r1-distill-llama-8b:fp8", "messages":[{"role": "user","content": "There is a llama in my garden, what should I do?"}], "max_tokens": 500, "temperature": 0.7, "stream": false}'
```

Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.

<Message type="note">
Ensure that the `messages` array is properly formatted with roles (user, assistant) and content.
</Message>

<Message type="tip">
This model is better used without `system prompt`, as suggested by the model provider.
</Message>

### Receiving inference responses

Upon sending the HTTP request to the public or private endpoints exposed by the server, you will receive inference responses from the managed Managed Inference server.
Process the output data according to your application's needs. The response will contain the output generated by the LLM model based on the input provided in the request.

<Message type="note">
Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently.
</Message>
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@ The following models in Scaleway's Managed Inference library can call tools as p
* mistral/mistral-nemo-instruct-2407
* mistral/pixtral-12b-2409
* nvidia/llama-3.1-nemotron-70b-instruct
* deepseek/deepseek-r1-distill-llama-70b
* deepseek/deepseek-r1-distill-llama-8b

## Understanding function calling

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,6 @@ meta/llama-3-8b-instruct:fp8

## Compatible Instances

## Compatible Instances

| Instance type | Max context length |
| ------------- |-------------|
| L4 | 8192 (FP8, BF16) |
Expand Down Expand Up @@ -86,4 +84,4 @@ Process the output data according to your application's needs. The response will

<Message type="note">
Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently.
</Message>
</Message>