Skip to content

[InferenceSnippet] Take token from env variable if not set #1514

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jun 4, 2025

Conversation

Wauplin
Copy link
Contributor

@Wauplin Wauplin commented Jun 3, 2025

Solve #1361.

Long awaited feature for @gary149. I did not go for the cleanest solution but it works well and should be robust/flexible enough if we need to fix something in the future.

EDIT: breaking change => access token should be passed as opts.accessToken now in snippets.getInferenceSnippets

TODO

once merged:

Some examples:

JS client

import { InferenceClient } from "@huggingface/inference";

const client = new InferenceClient(process.env.HF_TOKEN);

const chatCompletion = await client.chatCompletion({
    provider: "hf-inference",
    model: "meta-llama/Llama-3.1-8B-Instruct",
    messages: [
        {
            role: "user",
            content: "What is the capital of France?",
        },
    ],
});

console.log(chatCompletion.choices[0].message);

Python client

import os
from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="hf-inference",
    api_key=os.environ["HF_TOKEN"],
)

completion = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[
        {
            "role": "user",
            "content": "What is the capital of France?"
        }
    ],
)

print(completion.choices[0].message)

openai client

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://router.huggingface.co/hf-inference/models/meta-llama/Llama-3.1-8B-Instruct/v1",
    api_key=os.environ["HF_TOKEN"],
)

completion = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[
        {
            "role": "user",
            "content": "What is the capital of France?"
        }
    ],
)

print(completion.choices[0].message)

curl

curl https://router.huggingface.co/hf-inference/models/meta-llama/Llama-3.1-8B-Instruct/v1/chat/completions \
    -H "Authorization: Bearer $HF_TOKEN" \
    -H 'Content-Type: application/json' \
    -d '{
        "messages": [
            {
                "role": "user",
                "content": "What is the capital of France?"
            }
        ],
        "model": "meta-llama/Llama-3.1-8B-Instruct",
        "stream": false
    }'

check out PR diff for more examples

@Wauplin Wauplin requested review from pcuenca and ngxson as code owners June 4, 2025 09:07
@@ -115,7 +115,7 @@ export const bm25s = (model: ModelData): string[] => [
retriever = BM25HF.load_from_hub("${model.id}")`,
];

export const chatterbox = (model: ModelData): string[] => [
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related to this PR but fixes lint in PR (introduced in #1503)

Copy link
Contributor

@SBrandeis SBrandeis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent!

Copy link
Contributor

@hanouticelina hanouticelina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the snippets, looks good to me!

@SBrandeis
Copy link
Contributor

Merging

@SBrandeis SBrandeis merged commit 90ce13c into main Jun 4, 2025
7 of 11 checks passed
@SBrandeis SBrandeis deleted the token-from-env-in-snippets branch June 4, 2025 10:36
@julien-c
Copy link
Member

julien-c commented Jun 4, 2025

nice!

Wauplin added a commit that referenced this pull request Jun 4, 2025
Fix after #1514.

Now that we use a placeholder for access token to load from env, there
is no direct way to explictly generatea snippet for either a "direct
request" or a "routed request" (determined
[here](https://github.com/huggingface/huggingface.js/blob/1131b562d74c7c7b95966ec757fea94773a024f1/packages/inference/src/lib/makeRequestOptions.ts#L124-L141)
using `accessToken.startsWith("hf_")`). This PR adds a `directRequest?:
boolean;` option to the parameters which solves this problem.

Will require a follow-up PR in moon-landing.

cc @SBrandeis who found out the root cause

### expected behavior

display routed request by default in
https://huggingface.co/deepseek-ai/DeepSeek-R1-0528?inference_api=true&inference_provider=fireworks-ai&language=sh


![image](https://github.com/user-attachments/assets/0f2be3d5-9c7a-48a1-bbdb-b6ae5aa78f9d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants