Mitigation to HuggingFace Trainer

A lot of people in the community use HuggingFace Trainer for training, but sometimes it’s not flexible enough or missing certain features (native tp/pp/ep etc). Migrating to Megatron-LM comes with a steep learning curve, and while TorchTitan is lighter, it still takes some effort to learn and doesn’t fully support features like Flash Attention and Liger Kernel yet (correct me if I’m wrong).

One way to make TorchTitan more accessible could be allowing some of its features to work with existing HuggingFace Trainer code with just minor tweaks—like different parallelisms. That way, more users might give it a try even if it doesn’t fully support all training features yet.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mitigation to HuggingFace Trainer #824

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Mitigation to HuggingFace Trainer #824

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions