You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: Enable LoRA saving only for non MoE linear layers training with kernels. (#530)
* save peft
Signed-off-by: Will Johnson <[email protected]>
* post process hf converted dir
Signed-off-by: Will Johnson <[email protected]>
* fix: convert hf converted checkpoint
Signed-off-by: Will Johnson <[email protected]>
* lora config
Signed-off-by: Will Johnson <[email protected]>
* save adapter config
Signed-off-by: Will Johnson <[email protected]>
* fix: add input linear and output linear to target modules
Signed-off-by: Will Johnson <[email protected]>
* fix: extend instead of append
Signed-off-by: Will Johnson <[email protected]>
* fix: if hasattr peft config
Signed-off-by: Will Johnson <[email protected]>
* fix: remove unneeded target modules
Signed-off-by: Will Johnson <[email protected]>
* test: lora for scattermoe
Signed-off-by: Will Johnson <[email protected]>
* explitcitly don't support router layer
Signed-off-by: Will Johnson <[email protected]>
* docs: update documentation
Signed-off-by: Will Johnson <[email protected]>
* fix: simplify accelerate launch post processing
Signed-off-by: Will Johnson <[email protected]>
* tests: more target modules + ep_degree
Signed-off-by: Will Johnson <[email protected]>
* fix: only restrict all-linear, raise warning for other modules
Signed-off-by: Will Johnson <[email protected]>
* fix: augmentation test
Signed-off-by: Will Johnson <[email protected]>
* fix: raise error
Signed-off-by: Will Johnson <[email protected]>
* turn off requires grad if using scattermoe with lora
Signed-off-by: Will Johnson <[email protected]>
* fix: freeze scattermoe params
Signed-off-by: Will Johnson <[email protected]>
* fix: safer freezing
Signed-off-by: Will Johnson <[email protected]>
---------
Signed-off-by: Will Johnson <[email protected]>
Copy file name to clipboardExpand all lines: README.md
+3Lines changed: 3 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -855,6 +855,9 @@ Notes:
855
855
- When a boolean is passed, the expert parallel degree defaults to 1 and further the behaviour would be as follows:
856
856
- if True, it is Scatter MoE Kernels with experts sharded based on the top level sharding protocol (e.g. FSDP).
857
857
- if False, Scatter MoE Kernels with complete replication of experts across ranks.
858
+
- FSDP must be used when lora tuning with `--fast_moe`
859
+
- lora tuning with ScatterMoE is supported, but because of inference restrictions on vLLM/vanilla PEFT, the expert layers and router linear layer should not be trained as `target_modules` for models being tuned with ScatterMoE. Users have control over which `target_modules` they wish to train:
860
+
- At this time, only attention layers are trainable when using LoRA with scatterMoE. Until support for the router linear layer is added in, target modules must be specified explicitly (i.e `target_modules: ["q_proj", "v_proj", "o_proj", "k_proj"]`) instead of passing `target_modules: ["all-linear"]`.
858
861
-`world_size` must be divisible by the `ep_degree`
859
862
-`number of experts` in the MoE module must be divisible by the `ep_degree`
860
863
- Running fast moe modifies the state dict of the model, and must be post-processed which happens automatically and the converted checkpoint can be found at `hf_converted_checkpoint` folder within every saved checkpoint directory. Alternatively, we can perform similar option manually through [checkpoint utils](https://github.com/foundation-model-stack/fms-acceleration/blob/main/plugins/accelerated-moe/src/fms_acceleration_moe/utils/checkpoint_utils.py) script.
0 commit comments