Skip to content

Conversation

@willmj
Copy link
Collaborator

@willmj willmj commented Apr 17, 2025

Description of the change

A more limited version of #523 to enable LoRA but explicitly block off router linear and expert layers.

Related issue number

How to verify the PR

image
image

Was the PR tested

  • I have added >=1 unit test(s) for every new method I have added.
  • I have ensured all unit tests pass

willmj added 19 commits March 24, 2025 13:39
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
@github-actions
Copy link

Thanks for making a pull request! 😃
One of the maintainers will review and advise on the next steps.

@willmj willmj changed the title Save peft fast moe limited feat: save lora config fast moe -limited Apr 17, 2025
@github-actions github-actions bot added the feat label Apr 17, 2025
Signed-off-by: Will Johnson <[email protected]>
@willmj willmj force-pushed the save-peft-fast-moe-limited branch from 6b70cc8 to da81f93 Compare April 18, 2025 01:35
@kmehant kmehant changed the title feat: save lora config fast moe -limited feat: Enable LoRA saving only for non MoE linear layers training with kernels. Apr 18, 2025
Comment on lines +127 to +129
model = self.trainer.model
if hasattr(model, "module"):
model = model.module
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this needed for non lora use case? since after this step it would directly reach the below block

                        else:
                            model.config.save_pretrained(hf_converted_output_dir)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's needed to determine if there is a peft config for the following if statement, but in fine-tuning case this if statement should be false

@kmehant
Copy link
Collaborator

kmehant commented Apr 18, 2025

@willmj thanks for diligently taking my inputs, after addressing my comments, I request you to run all tests related to moe kernels on your local setup and paste screenshots here for record. Thanks, appreciate this PR!

model, train_args, modifiable_args=(peft_config,)
)
# For LoRa ScatterMoE, if expert layers are included, disable grad
if peft_config is not None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its better to just do an instance check using ScatterMoE and then freeze everything inside.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you should put a comment that this is a workaround, and in the future where ScatterMoE loras can be tuned, this code needs to be removed.

fabianlim
fabianlim previously approved these changes Apr 21, 2025
Copy link
Collaborator

@fabianlim fabianlim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since @kmehant is not around I approve it. but TBH im not that familiar with the recent updates in this repo. But overall looks ok. Just one more comment

model, train_args, modifiable_args=(peft_config,)
)
# For LoRa ScatterMoE, if expert layers are included, disable grad
if peft_config is not None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you should put a comment that this is a workaround, and in the future where ScatterMoE loras can be tuned, this code needs to be removed.

Signed-off-by: Will Johnson <[email protected]>
@willmj willmj merged commit 179da0a into foundation-model-stack:main Apr 21, 2025
9 checks passed
kmehant pushed a commit that referenced this pull request Apr 28, 2025
… kernels. (#530)

* save peft

Signed-off-by: Will Johnson <[email protected]>

* post process hf converted dir

Signed-off-by: Will Johnson <[email protected]>

* fix: convert hf converted checkpoint

Signed-off-by: Will Johnson <[email protected]>

* lora config

Signed-off-by: Will Johnson <[email protected]>

* save adapter config

Signed-off-by: Will Johnson <[email protected]>

* fix: add input linear and output linear to target modules

Signed-off-by: Will Johnson <[email protected]>

* fix: extend instead of append

Signed-off-by: Will Johnson <[email protected]>

* fix: if hasattr peft config

Signed-off-by: Will Johnson <[email protected]>

* fix: remove unneeded target modules

Signed-off-by: Will Johnson <[email protected]>

* test: lora for scattermoe

Signed-off-by: Will Johnson <[email protected]>

* explitcitly don't support router layer

Signed-off-by: Will Johnson <[email protected]>

* docs: update documentation

Signed-off-by: Will Johnson <[email protected]>

* fix: simplify accelerate launch post processing

Signed-off-by: Will Johnson <[email protected]>

* tests: more target modules + ep_degree

Signed-off-by: Will Johnson <[email protected]>

* fix: only restrict all-linear, raise warning for other modules

Signed-off-by: Will Johnson <[email protected]>

* fix: augmentation test

Signed-off-by: Will Johnson <[email protected]>

* fix: raise error

Signed-off-by: Will Johnson <[email protected]>

* turn off requires grad if using scattermoe with lora

Signed-off-by: Will Johnson <[email protected]>

* fix: freeze scattermoe params

Signed-off-by: Will Johnson <[email protected]>

* fix: safer freezing

Signed-off-by: Will Johnson <[email protected]>

---------

Signed-off-by: Will Johnson <[email protected]>
dushyantbehl pushed a commit to dushyantbehl/fms-hf-tuning that referenced this pull request Jun 23, 2025
… kernels. (foundation-model-stack#530)

* save peft

Signed-off-by: Will Johnson <[email protected]>

* post process hf converted dir

Signed-off-by: Will Johnson <[email protected]>

* fix: convert hf converted checkpoint

Signed-off-by: Will Johnson <[email protected]>

* lora config

Signed-off-by: Will Johnson <[email protected]>

* save adapter config

Signed-off-by: Will Johnson <[email protected]>

* fix: add input linear and output linear to target modules

Signed-off-by: Will Johnson <[email protected]>

* fix: extend instead of append

Signed-off-by: Will Johnson <[email protected]>

* fix: if hasattr peft config

Signed-off-by: Will Johnson <[email protected]>

* fix: remove unneeded target modules

Signed-off-by: Will Johnson <[email protected]>

* test: lora for scattermoe

Signed-off-by: Will Johnson <[email protected]>

* explitcitly don't support router layer

Signed-off-by: Will Johnson <[email protected]>

* docs: update documentation

Signed-off-by: Will Johnson <[email protected]>

* fix: simplify accelerate launch post processing

Signed-off-by: Will Johnson <[email protected]>

* tests: more target modules + ep_degree

Signed-off-by: Will Johnson <[email protected]>

* fix: only restrict all-linear, raise warning for other modules

Signed-off-by: Will Johnson <[email protected]>

* fix: augmentation test

Signed-off-by: Will Johnson <[email protected]>

* fix: raise error

Signed-off-by: Will Johnson <[email protected]>

* turn off requires grad if using scattermoe with lora

Signed-off-by: Will Johnson <[email protected]>

* fix: freeze scattermoe params

Signed-off-by: Will Johnson <[email protected]>

* fix: safer freezing

Signed-off-by: Will Johnson <[email protected]>

---------

Signed-off-by: Will Johnson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants