Skip to content

Commit

Permalink
Merge pull request #206 from huggingface/xrsrke-patch-3
Browse files Browse the repository at this point in the history
Update README.md
  • Loading branch information
xrsrke authored Jul 8, 2024
2 parents cb51ed8 + d5cf7c4 commit f1adf52
Showing 1 changed file with 5 additions and 0 deletions.
5 changes: 5 additions & 0 deletions examples/mup/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,3 +32,8 @@ We trained a 350m model with spectral µTransfer and standard parametrization us
Please check the directory [[./examples/mup/configs]](/examples/mup/configs) for the configurations we used to reproduce the experiments.

![LLaMA](./assets/llama.png)


#### Thoughts

For Spectral MuP, the experiments we used it on MLP only [link] and 300m LLaMA [link] (there are links to the experiment config in the mup readme). However, when we tested it on 1B/8B models iirc, the loss blew up for some reasons. So, we'd recommend they try μTransfer, not spectral μTransfer.

0 comments on commit f1adf52

Please sign in to comment.