-
Notifications
You must be signed in to change notification settings - Fork 65
feat: Enable LoRA saving only for non MoE linear layers training with kernels. #530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Enable LoRA saving only for non MoE linear layers training with kernels. #530
Conversation
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
|
Thanks for making a pull request! 😃 |
Signed-off-by: Will Johnson <[email protected]>
6b70cc8 to
da81f93
Compare
| model = self.trainer.model | ||
| if hasattr(model, "module"): | ||
| model = model.module |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this needed for non lora use case? since after this step it would directly reach the below block
else:
model.config.save_pretrained(hf_converted_output_dir)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's needed to determine if there is a peft config for the following if statement, but in fine-tuning case this if statement should be false
|
@willmj thanks for diligently taking my inputs, after addressing my comments, I request you to run all tests related to moe kernels on your local setup and paste screenshots here for record. Thanks, appreciate this PR! |
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
| model, train_args, modifiable_args=(peft_config,) | ||
| ) | ||
| # For LoRa ScatterMoE, if expert layers are included, disable grad | ||
| if peft_config is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its better to just do an instance check using ScatterMoE and then freeze everything inside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe you should put a comment that this is a workaround, and in the future where ScatterMoE loras can be tuned, this code needs to be removed.
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
fabianlim
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since @kmehant is not around I approve it. but TBH im not that familiar with the recent updates in this repo. But overall looks ok. Just one more comment
| model, train_args, modifiable_args=(peft_config,) | ||
| ) | ||
| # For LoRa ScatterMoE, if expert layers are included, disable grad | ||
| if peft_config is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe you should put a comment that this is a workaround, and in the future where ScatterMoE loras can be tuned, this code needs to be removed.
Signed-off-by: Will Johnson <[email protected]>
… kernels. (#530) * save peft Signed-off-by: Will Johnson <[email protected]> * post process hf converted dir Signed-off-by: Will Johnson <[email protected]> * fix: convert hf converted checkpoint Signed-off-by: Will Johnson <[email protected]> * lora config Signed-off-by: Will Johnson <[email protected]> * save adapter config Signed-off-by: Will Johnson <[email protected]> * fix: add input linear and output linear to target modules Signed-off-by: Will Johnson <[email protected]> * fix: extend instead of append Signed-off-by: Will Johnson <[email protected]> * fix: if hasattr peft config Signed-off-by: Will Johnson <[email protected]> * fix: remove unneeded target modules Signed-off-by: Will Johnson <[email protected]> * test: lora for scattermoe Signed-off-by: Will Johnson <[email protected]> * explitcitly don't support router layer Signed-off-by: Will Johnson <[email protected]> * docs: update documentation Signed-off-by: Will Johnson <[email protected]> * fix: simplify accelerate launch post processing Signed-off-by: Will Johnson <[email protected]> * tests: more target modules + ep_degree Signed-off-by: Will Johnson <[email protected]> * fix: only restrict all-linear, raise warning for other modules Signed-off-by: Will Johnson <[email protected]> * fix: augmentation test Signed-off-by: Will Johnson <[email protected]> * fix: raise error Signed-off-by: Will Johnson <[email protected]> * turn off requires grad if using scattermoe with lora Signed-off-by: Will Johnson <[email protected]> * fix: freeze scattermoe params Signed-off-by: Will Johnson <[email protected]> * fix: safer freezing Signed-off-by: Will Johnson <[email protected]> --------- Signed-off-by: Will Johnson <[email protected]>
… kernels. (foundation-model-stack#530) * save peft Signed-off-by: Will Johnson <[email protected]> * post process hf converted dir Signed-off-by: Will Johnson <[email protected]> * fix: convert hf converted checkpoint Signed-off-by: Will Johnson <[email protected]> * lora config Signed-off-by: Will Johnson <[email protected]> * save adapter config Signed-off-by: Will Johnson <[email protected]> * fix: add input linear and output linear to target modules Signed-off-by: Will Johnson <[email protected]> * fix: extend instead of append Signed-off-by: Will Johnson <[email protected]> * fix: if hasattr peft config Signed-off-by: Will Johnson <[email protected]> * fix: remove unneeded target modules Signed-off-by: Will Johnson <[email protected]> * test: lora for scattermoe Signed-off-by: Will Johnson <[email protected]> * explitcitly don't support router layer Signed-off-by: Will Johnson <[email protected]> * docs: update documentation Signed-off-by: Will Johnson <[email protected]> * fix: simplify accelerate launch post processing Signed-off-by: Will Johnson <[email protected]> * tests: more target modules + ep_degree Signed-off-by: Will Johnson <[email protected]> * fix: only restrict all-linear, raise warning for other modules Signed-off-by: Will Johnson <[email protected]> * fix: augmentation test Signed-off-by: Will Johnson <[email protected]> * fix: raise error Signed-off-by: Will Johnson <[email protected]> * turn off requires grad if using scattermoe with lora Signed-off-by: Will Johnson <[email protected]> * fix: freeze scattermoe params Signed-off-by: Will Johnson <[email protected]> * fix: safer freezing Signed-off-by: Will Johnson <[email protected]> --------- Signed-off-by: Will Johnson <[email protected]>
Description of the change
A more limited version of #523 to enable LoRA but explicitly block off router linear and expert layers.
Related issue number
How to verify the PR
Was the PR tested