Skip to content

Commit

Permalink
fixing datasets without max sequences
Browse files Browse the repository at this point in the history
  • Loading branch information
kothasuhas committed Feb 15, 2025
1 parent 5b7ddbb commit 2eaec4f
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions src/levanter/data/text.py
Original file line number Diff line number Diff line change
Expand Up @@ -1284,6 +1284,8 @@ def shuffle_ds(ds, key):
for name, ds in token_datasets.items():
if self.max_sequences_dict is not None and name in self.max_sequences_dict:
train_token_datasets[name] = ds.slice_dataset(end_index=self.max_sequences_dict[name])
else:
train_token_datasets[name] = ds

self.validation_token_datasets = {}
for name, ds in token_datasets.items():
Expand Down

0 comments on commit 2eaec4f

Please sign in to comment.