[FT] Faster generation with TransformersModel by using less padding

## Issue encountered

I noticed that the `greedy_until` function in `TransformersModel` uses excessive padding. In my case, I have a test set where my largest input has 27k tokens but most of the inputs are under 8k tokens. The current implementation uses `max_context_continuation_size_allowed` as the `max_length` in the tokenizer, which corresponds to the number of tokens for the largest samples in the **entire dataset** plus the maximum number of output tokens. This unnecessarily increases the evaluation time. 

## Solution/Feature
Instead of using `max_context_continuation_size_allowed` when tokenizing the batch contexts, it would be better to use something like this (untested):

```python
largest_sample_in_batch = len(batch[0].tokenized_context) 
max_generation_size = batch[0].generation_size if batch[0].generation_size else self.max_length - largest_sample_in_batch
max_length = min(largest_sample_in_batch + max_generation_size, self.max_length)

tokenized = self.tokenizer(
   ...
    max_length=max_length   # Only this needs to change
   ...
).to(self.device)
```

The calculations are essentially the same as the ones being done already in the code, only that we don't look at the first sample in the entire dataset but the first sample in the batch for determining the `max_length`.

If you think this makes sense, I could open a pull request.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FT] Faster generation with TransformersModel by using less padding #531

Issue encountered

Solution/Feature

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FT] Faster generation with TransformersModel by using less padding #531

Description

Issue encountered

Solution/Feature

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions