Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[InferenceClient] Add dynamic inference providers mapping #2836

Merged
merged 39 commits into from
Feb 11, 2025

Conversation

hanouticelina
Copy link
Contributor

@hanouticelina hanouticelina commented Feb 5, 2025

Companion PR to huggingface/huggingface.js#1173. This is to give a first idea on how to implement the dynamic inference providers mapping before moving on with the TODOs.

TODOs: (a bit similar to the hf.js PR)

  • handle prod/staging status -> warn the user when the model is in staging mode.
  • handle if model supports both text-generation and conversational => not needed for now => let's make them "conversational"-only
  • add and update tests.
  • The HfApi().model_info call is cached.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Comment on lines -15 to -38
SUPPORTED_MODELS = {
"automatic-speech-recognition": {
"openai/whisper-large-v3": "fal-ai/whisper",
},
"text-to-image": {
"black-forest-labs/FLUX.1-dev": "fal-ai/flux/dev",
"black-forest-labs/FLUX.1-schnell": "fal-ai/flux/schnell",
"ByteDance/SDXL-Lightning": "fal-ai/lightning-models",
"fal/AuraFlow-v0.2": "fal-ai/aura-flow",
"Kwai-Kolors/Kolors": "fal-ai/kolors",
"PixArt-alpha/PixArt-Sigma-XL-2-1024-MS": "fal-ai/pixart-sigma",
"playgroundai/playground-v2.5-1024px-aesthetic": "fal-ai/playground-v25",
"stabilityai/stable-diffusion-3-medium": "fal-ai/stable-diffusion-v3-medium",
"stabilityai/stable-diffusion-3.5-large": "fal-ai/stable-diffusion-v35-large",
"Warlord-K/Sana-1024": "fal-ai/sana",
},
"text-to-speech": {
"m-a-p/YuE-s1-7B-anneal-en-cot": "fal-ai/yue",
},
"text-to-video": {
"genmo/mochi-1-preview": "fal-ai/mochi-v1",
"tencent/HunyuanVideo": "fal-ai/hunyuan-video",
},
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm wondering if, for Developer experience, we should still support an (empty by default) mapping that Partners can use locally before implementing the server-side mapping

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and in the future let's add a link to some doc on how to implement the server-side mapping here)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I kept the hardcoded list

@Wauplin
Copy link
Contributor

Wauplin commented Feb 7, 2025

(updated the PR to align with hf.js client + did some refacto. Still some tests to fix I suppose ^^)

@Wauplin Wauplin added the inference Anything related to InferenceClient, providers, etc. label Feb 10, 2025
@Wauplin Wauplin marked this pull request as ready for review February 10, 2025 18:29
@Wauplin
Copy link
Contributor

Wauplin commented Feb 10, 2025

I've completed the PR. I still need to check a few things but it should be nearly done now. I took the opportunity to -once again 🙈- refactor how the providers are defined. We now use a base class that is inherited in all providers. Steps to prepare_request are split in atomic methods that can be tested individually (to prepare api_token, mapped_model, base_url, full url, payload, body, etc.). Goal is to reuse as much things as possible and therefore making the introduction of a new provider effortless. The case of LLMs is particularly interesting to optimize (see sambanova.py implementation).

I've also refactored the providers' tests to test only the parts that are overwritten in subclasses. The problem with current tests were that a lot was duplicated and it required the provider to be live on production (to get mapping). Since implementation is now centralized, we can also centralize the tests (still needs to be done). And testing atomic methods prevent from having to test end-to-end with production mapping.

Copy link
Contributor Author

@hanouticelina hanouticelina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the refactored version is much much better, especially for testing ❤️ thanks!

Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current state looks good to me :) @hanouticelina can you have a last look at it?

@hanouticelina
Copy link
Contributor Author

All good on my side! some tests are failing but seem unrelated but i'm not sure if they are flaky or not, first time seeing connection errors from the model download cdn https://cdn-lfs-dev.hf.co

@Wauplin
Copy link
Contributor

Wauplin commented Feb 11, 2025

Let's merge for now :) I've reported internally (here) our CI issues but it's not related indeed.

@Wauplin Wauplin merged commit 632c04f into main Feb 11, 2025
16 of 17 checks passed
@Wauplin Wauplin deleted the add-dynamic-inference-provider-mapping branch February 11, 2025 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
inference Anything related to InferenceClient, providers, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants