-
Notifications
You must be signed in to change notification settings - Fork 310
Adds continuous batching #850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
… into add-fast-generate
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces continuous batching support for Transformer-based models, enabling split-wise streaming generation.
- Adds
continuous_batching
flag throughout configuration, model initialization, and generation functions. - Implements a new
_continuous_greedy_until
path and refactors_generate
to dispatch based on the flag. - Updates
GenerationParameters
and example configs to includenum_blocks
andblock_size
, and adjusts tests accordingly.
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
tests/models/endpoints/test_tgi_model.py | Inserts block_size and num_blocks into generation parameters |
tests/models/endpoints/test_endpoint_model.py | Inserts num_blocks and block_size into generation parameters |
src/lighteval/models/transformers/transformers_model.py | Propagates continuous_batching through init, from_model, and generate paths |
src/lighteval/models/model_input.py | Extends GenerationParameters with num_blocks and block_size |
examples/model_configs/transformers_model.yaml | Adds continuous_batching and example num_blocks /block_size |
Comments suppressed due to low confidence (2)
src/lighteval/models/model_input.py:28
- [nitpick] New fields
num_blocks
andblock_size
inGenerationParameters
lack descriptions in the class docstring. Consider documenting their purpose and effects.
num_blocks: NonNegativeInt | None = None # transformers
src/lighteval/models/transformers/transformers_model.py:114
- There are no existing tests covering the new
continuous_batching
logic path. Consider adding unit tests to verify bothTrue
andFalse
behaviors.
continuous_batching (bool):
self.transformers_config = model.config | ||
self.generation_config_dict = config.generation_parameters.to_transformers_dict() | ||
self.config = config if config is not None else TransformersModelConfig(model_name=model.config.name_or_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The from_model
constructor does not set self.continuous_batching
, so models loaded via from_model
will always default to False
. Add self.continuous_batching = self.config.continuous_batching
after setting self.config
.
self.config = config if config is not None else TransformersModelConfig(model_name=model.config.name_or_path) | |
self.config = config if config is not None else TransformersModelConfig(model_name=model.config.name_or_path) | |
self.continuous_batching = self.config.continuous_batching if config and hasattr(self.config, 'continuous_batching') else False |
Copilot uses AI. Check for mistakes.
"block_size": None, | ||
"num_blocks": None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Test insertion order of block_size
then num_blocks
differs from the other endpoint test. Consider keeping parameter order consistent across tests to avoid confusion.
"block_size": None, | |
"num_blocks": None, | |
"num_blocks": None, | |
"block_size": None, |
Copilot uses AI. Check for mistakes.
"num_blocks": None, | ||
"block_size": None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Here the parameters are added as num_blocks
then block_size
, which is the reverse of the other test. Align ordering to maintain consistency.
"num_blocks": None, | |
"block_size": None, | |
"block_size": None, | |
"num_blocks": None, |
Copilot uses AI. Check for mistakes.
No description provided.