Skip to content

Conversation

jlamypoirier
Copy link
Collaborator

@jlamypoirier jlamypoirier commented Sep 3, 2025

✨ Description

Rework LM config:

  • Extract embedding and output layer configs.
  • Rename tie_word_embeddings -> output_layer.tied_weight
  • Position embeddings are now enabled through embeddings_layer.position_embeddings.enabled, always disabled by default independently of rotary embeddings.
  • Rename max_position_embeddings -> embeddings_layer.num_position_embeddings
  • Rename parallel_embeddings -> embeddings_layer.vocab_parallel

Rework initialization config:

  • Remove most ad-hoc initialization arguments (leftovers from Block interface: extract mixer and mlp config #359)
  • Add dynamic initialization config scheme so initialization may be arbitrarily configured.
  • Add optional initialization config to all parameters. If not set, the default set by the parent layer will be used, matching previous behaviour.
  • Mamba: remove dt_init, dt_scale as the same can be obtained through the new init config scheme. Replace dt_min, dt_max, dt_init_floor by the mamba_dt_bias initialization type with similar options.

Rework LR scales:

  • Add lr_scale option to all parameters and most layers.
  • LR scales combine multiplicatively, i.e. the actual LR scale for a given parameter is the multiplication of its lr scale and that of all its parent

Rework Peft (lora):

  • Add apply_peft option to linear layers. If true, peft will be enabled for that layer (ex. wrapped with lora), otherwise the layer will be treated as non-peft (ex. frozen or ignored). If let unset, the default set by the parent layer will be used instead . (False except for attn query and value.)
  • Remove transformer peft config, use peft config directly instead. (Was there to determine the peft layers, now handled in linear config)

Todo (next prs):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant