Skip to content

Pull requests: huggingface/nanotron

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

fix init and init scaling factor
#349 opened Apr 14, 2025 by NouamaneTazi Loading…
6 tasks
quicks
#338 opened Apr 4, 2025 by NouamaneTazi Draft
6 tasks
calcuate mean token accuracy metric while training
#337 opened Apr 4, 2025 by kashif Loading…
[WIP] Add multilingual evals
#336 opened Apr 2, 2025 by anton-l Loading…
6 tasks
Logging outlier batch
#332 opened Apr 1, 2025 by eliebak Draft
Ademamix
#300 opened Mar 23, 2025 by eliebak Draft
6 tasks
Muon
#298 opened Mar 23, 2025 by eliebak Draft
6 tasks
[WIP] Distillation
#290 opened Mar 6, 2025 by Stillerman Loading…
2 of 14 tasks
Fix unpacking issue caused by newer Flash Attention
#289 opened Mar 5, 2025 by Stillerman Loading…
3 of 6 tasks
Recommend the use of Spack on supercomputers
#282 opened Feb 19, 2025 by thomas-bouvier Loading…
Add MLA
#278 opened Feb 5, 2025 by zzhhjjj Loading…
Add nanotron performance
#274 opened Jan 23, 2025 by xrsrke Loading…
fp8
#266 opened Dec 18, 2024 by xrsrke Loading…
Fix wrong initialization of lr scheduler
#256 opened Nov 29, 2024 by kylematoba Loading…
[NEW] Llama3.2 weight converters 🦙
#255 opened Nov 28, 2024 by TJ-Solergibert Loading…
6 tasks
Fix initial_lr when resuming training
#243 opened Nov 17, 2024 by Lauler Loading…
ProTip! Updated in the last three days: updated:>2025-04-12.