Skip to content

Conversation

jlamypoirier
Copy link
Collaborator

@jlamypoirier jlamypoirier commented Aug 20, 2025

✨ Description

  • Add dynamic initialization configuration scheme so all initializations can be arbitrarily configured.
  • Add standardized linear config to hold standard properties for linear layers, ex. initialization, lr scale, bias enabling. Also add LinearWeightConfig for standalone linear-like weights, ex. embeddings, lm output. Linear configs support variable defaults, so that each parent layer may define its own, ex. MLP layer 1 and 2 don't have the same default initialization, MoE router doesnt have a bias by default.
  • Add linear config for all linear layers with appropriate defaults, and remove the replaced parameters.
  • Add lr scale parameter to normalization.
  • Remove the ad-hoc specialization of peft to transformers. Instead, peft behavior is defined in individual LinearConfig's apply_peft, with sensible defaults set for each of them.

Notes:

  • Lr scales can also be defined through per_layer_lr_scale. The effect is multiplicative (combine_lr_scales)
  • Wondering whether to keep other block-level things like add_linear_biases and init_method_std as shortcut to setting all linear separately. It would be really convenient but could be harder to manage.

TODO:

  • SSMs
  • Handle "concatenated" weights, ex. attn. key_value, MLP gate_and_up, MoE concatenated expert weights. We've so far had ad-hoc solutions for separating key and value for peft, and for separating the lr scale by expert, but I'd like something more generic.
  • Determine how much backward compatibility we want.

@jlamypoirier jlamypoirier changed the title Block interface: initialization, lr scale, peft [Prototype] Block interface: initialization, lr scale, peft Aug 27, 2025
@jlamypoirier jlamypoirier changed the base branch from block_interface to block_interface_weight August 27, 2025 22:09
@jlamypoirier jlamypoirier changed the base branch from block_interface_weight to block_interface August 27, 2025 22:12
@jlamypoirier jlamypoirier deleted the block_interface_linear branch September 19, 2025 01:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant