Block interface: rework LM config, fine-grained initialization, lr_scale, peft #360
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
✨ Description
Rework LM config:
tie_word_embeddings
->output_layer.tied_weight
embeddings_layer.position_embeddings.enabled
, always disabled by default independently of rotary embeddings.max_position_embeddings
->embeddings_layer.num_position_embeddings
parallel_embeddings
->embeddings_layer.vocab_parallel
Rework initialization config:
dt_init
,dt_scale
as the same can be obtained through the new init config scheme. Replacedt_min
,dt_max
,dt_init_floor
by themamba_dt_bias
initialization type with similar options.Rework LR scales:
Rework Peft (lora):
apply_peft
option to linear layers. If true, peft will be enabled for that layer (ex. wrapped with lora), otherwise the layer will be treated as non-peft (ex. frozen or ignored). If let unset, the default set by the parent layer will be used instead . (False except for attn query and value.)Todo (next prs):