Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[InferenceClient] Better handling of task parameters #2812

Merged
merged 11 commits into from
Jan 31, 2025
54 changes: 44 additions & 10 deletions src/huggingface_hub/inference/_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,6 @@
TextGenerationInputGrammarType,
TextGenerationOutput,
TextGenerationStreamOutput,
TextToImageTargetSize,
TextToSpeechEarlyStoppingEnum,
TokenClassificationAggregationStrategy,
TokenClassificationOutputElement,
Expand Down Expand Up @@ -474,8 +473,6 @@ def automatic_speech_recognition(
model (`str`, *optional*):
The model to use for ASR. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed
Inference Endpoint. If not provided, the default recommended model for ASR will be used.
parameters (Dict[str, Any], *optional*):
Additional parameters to pass to the model.
Returns:
[`AutomaticSpeechRecognitionOutput`]: An item containing the transcribed text and optionally the timestamp chunks.

Expand Down Expand Up @@ -2392,9 +2389,8 @@ def text_to_image(
guidance_scale: Optional[float] = None,
model: Optional[str] = None,
scheduler: Optional[str] = None,
target_size: Optional[TextToImageTargetSize] = None,
seed: Optional[int] = None,
**kwargs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a breaking change but hopefully totally fine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll mention this in the next release notes! but it should be fine, if users were previously using text_to_image with the HF Inference API, this shouldn't be an issue since all API parameters were exposed as explicit method arguments

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this shouldn't be an issue since all API parameters were exposed as explicit method arguments

yes exactly

extra_parameters: Optional[Dict[str, Any]] = None,
) -> "Image":
"""
Generate an image based on a given text using a specified model.
Expand Down Expand Up @@ -2426,10 +2422,11 @@ def text_to_image(
Defaults to None.
scheduler (`str`, *optional*):
Override the scheduler with a compatible one.
target_size (`TextToImageTargetSize`, *optional*):
The size in pixel of the output image
seed (`int`, *optional*):
Seed for the random number generator.
extra_parameters (`Dict[str, Any]`, *optional*):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a good example of how to use extra_parameters for a specific model on a specific provider? Would be good to add a least one example (either in text-to-image or text-to-video depending on what's best)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, added in 305c720

Additional provider-specific parameters to pass to the model. Refer to the provider's documentation
for supported parameters.

Returns:
`Image`: The generated image.
Expand Down Expand Up @@ -2482,6 +2479,21 @@ def text_to_image(
... )
>>> image.save("astronaut.png")
```

Example using Replicate provider with extra parameters
```py
>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
... provider="replicate", # Use replicate provider
... api_key="hf_...", # Pass your HF token
... )
>>> image = client.text_to_image(
... "An astronaut riding a horse on the moon.",
... model="black-forest-labs/FLUX.1-schnell",
... extra_parameters={"output_quality": 100},
... )
>>> image.save("astronaut.png")
```
"""
provider_helper = get_provider_helper(self.provider, task="text-to-image")
request_parameters = provider_helper.prepare_request(
Expand All @@ -2493,9 +2505,8 @@ def text_to_image(
"num_inference_steps": num_inference_steps,
"guidance_scale": guidance_scale,
"scheduler": scheduler,
"target_size": target_size,
"seed": seed,
**kwargs,
**(extra_parameters or {}),
},
headers=self.headers,
model=model or self.model,
Expand All @@ -2515,6 +2526,7 @@ def text_to_video(
num_frames: Optional[float] = None,
num_inference_steps: Optional[int] = None,
seed: Optional[int] = None,
extra_parameters: Optional[Dict[str, Any]] = None,
) -> bytes:
"""
Generate a video based on a given text.
Expand All @@ -2538,6 +2550,9 @@ def text_to_video(
expense of slower inference.
seed (`int`, *optional*):
Seed for the random number generator.
extra_parameters (`Dict[str, Any]`, *optional*):
Additional provider-specific parameters to pass to the model. Refer to the provider's documentation
for supported parameters.

Returns:
`bytes`: The generated video.
Expand Down Expand Up @@ -2583,6 +2598,7 @@ def text_to_video(
"num_frames": num_frames,
"num_inference_steps": num_inference_steps,
"seed": seed,
**(extra_parameters or {}),
},
headers=self.headers,
model=model or self.model,
Expand Down Expand Up @@ -2613,6 +2629,7 @@ def text_to_speech(
top_p: Optional[float] = None,
typical_p: Optional[float] = None,
use_cache: Optional[bool] = None,
extra_parameters: Optional[Dict[str, Any]] = None,
) -> bytes:
"""
Synthesize an audio of a voice pronouncing a given text.
Expand Down Expand Up @@ -2670,7 +2687,9 @@ def text_to_speech(
paper](https://hf.co/papers/2202.00666) for more details.
use_cache (`bool`, *optional*):
Whether the model should use the past last key/values attentions to speed up decoding

extra_parameters (`Dict[str, Any]`, *optional*):
Additional provider-specific parameters to pass to the model. Refer to the provider's documentation
for supported parameters.
Returns:
`bytes`: The generated audio.

Expand Down Expand Up @@ -2717,6 +2736,20 @@ def text_to_speech(
... )
>>> Path("hello_world.flac").write_bytes(audio)
```
Example using Replicate provider with extra parameters
```py
>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
... provider="replicate", # Use replicate provider
... api_key="hf_...", # Pass your HF token
... )
>>> audio = client.text_to_speech(
... "Hello, my name is Kororo, an awesome text-to-speech model.",
... model="hexgrad/Kokoro-82M",
... extra_parameters={"voice": "af_nicole"},
... )
>>> Path("hello.flac").write_bytes(audio)
```
"""
provider_helper = get_provider_helper(self.provider, task="text-to-speech")
request_parameters = provider_helper.prepare_request(
Expand All @@ -2738,6 +2771,7 @@ def text_to_speech(
"top_p": top_p,
"typical_p": typical_p,
"use_cache": use_cache,
**(extra_parameters or {}),
},
headers=self.headers,
model=model or self.model,
Expand Down
54 changes: 44 additions & 10 deletions src/huggingface_hub/inference/_generated/_async_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,6 @@
TextGenerationInputGrammarType,
TextGenerationOutput,
TextGenerationStreamOutput,
TextToImageTargetSize,
TextToSpeechEarlyStoppingEnum,
TokenClassificationAggregationStrategy,
TokenClassificationOutputElement,
Expand Down Expand Up @@ -507,8 +506,6 @@ async def automatic_speech_recognition(
model (`str`, *optional*):
The model to use for ASR. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed
Inference Endpoint. If not provided, the default recommended model for ASR will be used.
parameters (Dict[str, Any], *optional*):
Additional parameters to pass to the model.
Returns:
[`AutomaticSpeechRecognitionOutput`]: An item containing the transcribed text and optionally the timestamp chunks.

Expand Down Expand Up @@ -2448,9 +2445,8 @@ async def text_to_image(
guidance_scale: Optional[float] = None,
model: Optional[str] = None,
scheduler: Optional[str] = None,
target_size: Optional[TextToImageTargetSize] = None,
seed: Optional[int] = None,
**kwargs,
extra_parameters: Optional[Dict[str, Any]] = None,
) -> "Image":
"""
Generate an image based on a given text using a specified model.
Expand Down Expand Up @@ -2482,10 +2478,11 @@ async def text_to_image(
Defaults to None.
scheduler (`str`, *optional*):
Override the scheduler with a compatible one.
target_size (`TextToImageTargetSize`, *optional*):
The size in pixel of the output image
seed (`int`, *optional*):
Seed for the random number generator.
extra_parameters (`Dict[str, Any]`, *optional*):
Additional provider-specific parameters to pass to the model. Refer to the provider's documentation
for supported parameters.

Returns:
`Image`: The generated image.
Expand Down Expand Up @@ -2539,6 +2536,21 @@ async def text_to_image(
... )
>>> image.save("astronaut.png")
```

Example using Replicate provider with extra parameters
```py
>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
... provider="replicate", # Use replicate provider
... api_key="hf_...", # Pass your HF token
... )
>>> image = client.text_to_image(
... "An astronaut riding a horse on the moon.",
... model="black-forest-labs/FLUX.1-schnell",
... extra_parameters={"output_quality": 100},
... )
>>> image.save("astronaut.png")
```
"""
provider_helper = get_provider_helper(self.provider, task="text-to-image")
request_parameters = provider_helper.prepare_request(
Expand All @@ -2550,9 +2562,8 @@ async def text_to_image(
"num_inference_steps": num_inference_steps,
"guidance_scale": guidance_scale,
"scheduler": scheduler,
"target_size": target_size,
"seed": seed,
**kwargs,
**(extra_parameters or {}),
},
headers=self.headers,
model=model or self.model,
Expand All @@ -2572,6 +2583,7 @@ async def text_to_video(
num_frames: Optional[float] = None,
num_inference_steps: Optional[int] = None,
seed: Optional[int] = None,
extra_parameters: Optional[Dict[str, Any]] = None,
) -> bytes:
"""
Generate a video based on a given text.
Expand All @@ -2595,6 +2607,9 @@ async def text_to_video(
expense of slower inference.
seed (`int`, *optional*):
Seed for the random number generator.
extra_parameters (`Dict[str, Any]`, *optional*):
Additional provider-specific parameters to pass to the model. Refer to the provider's documentation
for supported parameters.

Returns:
`bytes`: The generated video.
Expand Down Expand Up @@ -2640,6 +2655,7 @@ async def text_to_video(
"num_frames": num_frames,
"num_inference_steps": num_inference_steps,
"seed": seed,
**(extra_parameters or {}),
},
headers=self.headers,
model=model or self.model,
Expand Down Expand Up @@ -2670,6 +2686,7 @@ async def text_to_speech(
top_p: Optional[float] = None,
typical_p: Optional[float] = None,
use_cache: Optional[bool] = None,
extra_parameters: Optional[Dict[str, Any]] = None,
) -> bytes:
"""
Synthesize an audio of a voice pronouncing a given text.
Expand Down Expand Up @@ -2727,7 +2744,9 @@ async def text_to_speech(
paper](https://hf.co/papers/2202.00666) for more details.
use_cache (`bool`, *optional*):
Whether the model should use the past last key/values attentions to speed up decoding

extra_parameters (`Dict[str, Any]`, *optional*):
Additional provider-specific parameters to pass to the model. Refer to the provider's documentation
for supported parameters.
Returns:
`bytes`: The generated audio.

Expand Down Expand Up @@ -2775,6 +2794,20 @@ async def text_to_speech(
... )
>>> Path("hello_world.flac").write_bytes(audio)
```
Example using Replicate provider with extra parameters
```py
>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
... provider="replicate", # Use replicate provider
... api_key="hf_...", # Pass your HF token
... )
>>> audio = client.text_to_speech(
... "Hello, my name is Kororo, an awesome text-to-speech model.",
... model="hexgrad/Kokoro-82M",
... extra_parameters={"voice": "af_nicole"},
... )
>>> Path("hello.flac").write_bytes(audio)
```
"""
provider_helper = get_provider_helper(self.provider, task="text-to-speech")
request_parameters = provider_helper.prepare_request(
Expand All @@ -2796,6 +2829,7 @@ async def text_to_speech(
"top_p": top_p,
"typical_p": typical_p,
"use_cache": use_cache,
**(extra_parameters or {}),
},
headers=self.headers,
model=model or self.model,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,6 @@ class TextToImageParameters(BaseInferenceType):
"""Override the scheduler with a compatible one."""
seed: Optional[int] = None
"""Seed for the random number generator."""
target_size: Optional[TextToImageTargetSize] = None
"""The size in pixel of the output image"""


@dataclass
Expand Down
2 changes: 1 addition & 1 deletion src/huggingface_hub/inference/_providers/fal_ai.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@ def __init__(self):

def _prepare_payload(self, inputs: Any, parameters: Dict[str, Any]) -> Dict[str, Any]:
parameters = {k: v for k, v in parameters.items() if v is not None}
if "image_size" not in parameters and "width" in parameters and "height" in parameters:
if "width" in parameters and "height" in parameters:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if only one if passed btw?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should be able to send only one if specified, the other one would be set to the default value. I'll fix that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no actually for fal-ai, you either send both, or neither.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no default values for each one of them, according to their documentation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok thanks for checking 👍

parameters["image_size"] = {
"width": parameters.pop("width"),
"height": parameters.pop("height"),
Expand Down
8 changes: 7 additions & 1 deletion src/huggingface_hub/inference/_providers/together.py
Original file line number Diff line number Diff line change
Expand Up @@ -142,10 +142,16 @@ def __init__(self):
super().__init__("text-to-image")

def _prepare_payload(self, inputs: Any, parameters: Dict[str, Any]) -> Dict[str, Any]:
parameters = {k: v for k, v in parameters.items() if v is not None}
if "num_inference_steps" in parameters:
parameters["steps"] = parameters.pop("num_inference_steps")
if "guidance_scale" in parameters:
parameters["guidance"] = parameters.pop("guidance_scale")

payload = {
"prompt": inputs,
"response_format": "base64",
**{k: v for k, v in parameters.items() if v is not None},
**parameters,
}
return payload

Expand Down
Loading
Loading