Block interface: rework LM config, fine-grained initialization, lr_scale, peft #360

jlamypoirier · 2025-09-03T21:32:21Z

✨ Description

Rework LM config:

Extract embedding and output layer configs.
Rename tie_word_embeddings -> output_layer.tied_weight
Position embeddings are now enabled through embeddings_layer.position_embeddings.enabled, always disabled by default independently of rotary embeddings.
Rename max_position_embeddings -> embeddings_layer.num_position_embeddings
Rename parallel_embeddings -> embeddings_layer.vocab_parallel

Rework initialization config:

Remove most ad-hoc initialization arguments (leftovers from Block interface: extract mixer and mlp config #359)
Add dynamic initialization config scheme so initialization may be arbitrarily configured.
Add optional initialization config to all parameters. If not set, the default set by the parent layer will be used, matching previous behaviour.
Mamba: remove dt_init, dt_scale as the same can be obtained through the new init config scheme. Replace dt_min, dt_max, dt_init_floor by the mamba_dt_bias initialization type with similar options.

Rework LR scales:

Add lr_scale option to all parameters and most layers.
LR scales combine multiplicatively, i.e. the actual LR scale for a given parameter is the multiplication of its lr scale and that of all its parent

Rework Peft (lora):

Add apply_peft option to linear layers. If true, peft will be enabled for that layer (ex. wrapped with lora), otherwise the layer will be treated as non-peft (ex. frozen or ignored). If let unset, the default set by the parent layer will be used instead . (False except for attn query and value.)
Remove transformer peft config, use peft config directly instead. (Was there to determine the peft layers, now handled in linear config)

Todo (next prs):

(from Block interface: extract mixer and mlp config #359) Add back fine-grained bias enabling config (quen2 and dream disabled).
(from Block interface: extract mixer and mlp config #359) Rework SSM conversion (disabled).
(from Block interface: parameter and linear config, separate SSM config. #358) Allow separate configuration for concatenated layers (ex. key_value, ssm in_proj)
LM config could use polishing

jlamypoirier added 4 commits September 3, 2025 17:31

stuff

ecad96b

fix

3fd092c

stuff

1a3497c

stuff

b6e7fce

jlamypoirier mentioned this pull request Sep 9, 2025

Block interface: misc config improvements, modular tflops computation #361

Draft