Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FT] Faster generation with TransformersModel by using less padding #531

Open
rolshoven opened this issue Feb 3, 2025 · 0 comments
Open
Labels
feature request New feature/request

Comments

@rolshoven
Copy link

rolshoven commented Feb 3, 2025

Issue encountered

I noticed that the greedy_until function in TransformersModel uses excessive padding. In my case, I have a test set where my largest input has 27k tokens but most of the inputs are under 8k tokens. The current implementation uses max_context_continuation_size_allowed as the max_length in the tokenizer, which corresponds to the number of tokens for the largest samples in the entire dataset plus the maximum number of output tokens. This unnecessarily increases the evaluation time.

Solution/Feature

Instead of using max_context_continuation_size_allowed when tokenizing the batch contexts, it would be better to use something like this (untested):

largest_sample_in_batch = len(batch[0].tokenized_context) 
max_generation_size = batch[0].generation_size if batch[0].generation_size else self.max_length - largest_sample_in_batch
max_length = min(largest_sample_in_batch + max_generation_size, self.max_length)

tokenized = self.tokenizer(
   ...
    max_length=max_length   # Only this needs to change
   ...
).to(self.device)

The calculations are essentially the same as the ones being done already in the code, only that we don't look at the first sample in the entire dataset but the first sample in the batch for determining the max_length.

If you think this makes sense, I could open a pull request.

@rolshoven rolshoven added the feature request New feature/request label Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature/request
Projects
None yet
Development

No branches or pull requests

1 participant