[Prototype] Block interface: initialization, lr scale, peft #354

jlamypoirier · 2025-08-20T23:13:45Z

✨ Description

Add dynamic initialization configuration scheme so all initializations can be arbitrarily configured.
Add standardized linear config to hold standard properties for linear layers, ex. initialization, lr scale, bias enabling. Also add LinearWeightConfig for standalone linear-like weights, ex. embeddings, lm output. Linear configs support variable defaults, so that each parent layer may define its own, ex. MLP layer 1 and 2 don't have the same default initialization, MoE router doesnt have a bias by default.
Add linear config for all linear layers with appropriate defaults, and remove the replaced parameters.
Add lr scale parameter to normalization.
Remove the ad-hoc specialization of peft to transformers. Instead, peft behavior is defined in individual LinearConfig's apply_peft, with sensible defaults set for each of them.

Notes:

Lr scales can also be defined through per_layer_lr_scale. The effect is multiplicative (combine_lr_scales)
Wondering whether to keep other block-level things like add_linear_biases and init_method_std as shortcut to setting all linear separately. It would be really convenient but could be harder to manage.

TODO:

SSMs
Handle "concatenated" weights, ex. attn. key_value, MLP gate_and_up, MoE concatenated expert weights. We've so far had ad-hoc solutions for separating key and value for peft, and for separating the lr scale by expert, but I'd like something more generic.
Determine how much backward compatibility we want.

jlamypoirier added 5 commits August 20, 2025 19:12

stuff

0743ebb

Merge branch 'block_interface' into block_interface_linear

229b7f5

misc

166662c

stuff

b73d6c1

stuff

9741ba0

jlamypoirier changed the title ~~Block interface: initialization, lr scale, peft~~ [Prototype] Block interface: initialization, lr scale, peft Aug 27, 2025

jlamypoirier added 2 commits August 27, 2025 17:56

fixes

be69677

Merge branch 'block_interface_weight' into block_interface_linear

bb40fed

jlamypoirier changed the base branch from block_interface to block_interface_weight August 27, 2025 22:09

jlamypoirier changed the base branch from block_interface_weight to block_interface August 27, 2025 22:12

jlamypoirier closed this Sep 17, 2025

jlamypoirier deleted the block_interface_linear branch September 19, 2025 01:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Prototype] Block interface: initialization, lr scale, peft #354

[Prototype] Block interface: initialization, lr scale, peft #354

Uh oh!

jlamypoirier commented Aug 20, 2025 •

edited

Loading

Uh oh!

Uh oh!

[Prototype] Block interface: initialization, lr scale, peft #354

[Prototype] Block interface: initialization, lr scale, peft #354

Uh oh!

Conversation

jlamypoirier commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✨ Description

Uh oh!

Uh oh!

jlamypoirier commented Aug 20, 2025 •

edited

Loading