Llama models with custom configurations and uploading to Hugging Face

It will be great if torchtitan could support 1) training llama models with custom configurations (like different number of kv heads, number of layers, etc.) and 2) direct uploading of the trained weights to HF hub, where people can download and run the model by simply referencing the HF model repo id. These supports will greatly help the community to investigate the tradeoff between size, speed, and accuracy of a range of models.

1) Currently in torchtitan, only a fixed set of classic llama model architectures are allowed, and they are hard-coded [here](https://github.com/huggingface/transformers/blob/3a49ebe0d893ec7ada62b5b06c5394b90288c097/src/transformers/models/llama/convert_llama_weights_to_hf.py#L76) and [here](https://github.com/bkchang/torchtitan/blob/ac83f9c62749bf5bdb96988f2d9ba8066f89237d/torchtitan/models/llama/__init__.py#L14). Enabling custom inputs of model parameters in [config files](https://github.com/pytorch/torchtitan/blob/ac83f9c62749bf5bdb96988f2d9ba8066f89237d/train_configs/llama3_8b.toml) and feed it to [ModelArgs](https://github.com/bkchang/torchtitan/blob/ac83f9c62749bf5bdb96988f2d9ba8066f89237d/torchtitan/models/llama/model.py#L21) should be straightforward, maybe with a script or a helper function.

2) For uploading to HF hub, [a script from HF](https://github.com/huggingface/transformers/blob/74b92c62560b7ade42d35a49f9063adc8b805c4a/src/transformers/models/llama/convert_llama_weights_to_hf.py#L76) could help converting torchtitan's output weights to HF format (thanks @tianyu-l for mentioning this), but a [params.json](https://github.com/huggingface/transformers/blob/74b92c62560b7ade42d35a49f9063adc8b805c4a/src/transformers/models/llama/convert_llama_weights_to_hf.py#L121) file and a [tokenizer.model](https://github.com/huggingface/transformers/blob/74b92c62560b7ade42d35a49f9063adc8b805c4a/src/transformers/models/llama/convert_llama_weights_to_hf.py#L393) file are needed for the script. tokenizer.model is redownloaded before running torchtitan, so it only needs to be linked. On the other hand, params.json can be easily written by inspecting the training config.

I can help implement these features, but am wondering if the torchtitan team would be interested in having these features in the torchtitan repo?

Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Llama models with custom configurations and uploading to Hugging Face #420

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Llama models with custom configurations and uploading to Hugging Face #420

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions