Skip to content

Add Option to use Target Model in LCM-LoRA Scripts #6537

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

dg845
Copy link
Contributor

@dg845 dg845 commented Jan 11, 2024

What does this PR do?

This PR enables a target model to be optionally used in the LCM-LoRA distillation scripts via the --use_target_model argument.

Follow up to #6505.

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@patrickvonplaten
@patil-suraj
@sayakpaul
@jon-chuang
@shuminghu

@@ -1388,6 +1433,8 @@ def compute_embeddings(prompt_batch, proportion_empty_prompts, text_encoder, tok

# Checks if the accelerator has performed an optimization step behind the scenes
if accelerator.sync_gradients:
# 12. If using a target model, update its parameters via EMA.
update_ema(target_unet.parameters(), unet.parameters(), args.ema_decay)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this works? I had errors with the LoRA parameters.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a workaround in the issue

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both are lora weights, so they should work?

Copy link
Contributor Author

@dg845 dg845 Jan 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't been able to test this fully yet, it's possible that this runs into the errors mentioned in #6505 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the current implementation is correctly updating the LoRA parameters.

@@ -657,6 +671,24 @@ def parse_args():
help="Proportion of image prompts to be replaced with empty strings. Defaults to 0 (no prompt replacement).",
)
# ----Latent Consistency Distillation (LCD) Specific Arguments----
parser.add_argument(
"--use_target_model",
action="store_true",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this default to false so existing users are not surprised?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The target model will be used only if the --use_target_model flag is specified (so existing script calls should work as before).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. My bad.

Copy link
Contributor

@patil-suraj patil-suraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good to me. Could you explain the behind this ? Do you have any experiments that demonstrate the use of this ?

@dg845
Copy link
Contributor Author

dg845 commented Jan 18, 2024

#6505 (comment) reports seeing a lot of training instability when training using the current LCM-LoRA script. @jon-chuang, would you be willing to share more details about the training instability?

@dg845 dg845 changed the title [WIP] Add Option to use Target Model in LCM-LoRA Scripts Add Option to use Target Model in LCM-LoRA Scripts Jan 21, 2024
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@jon-chuang
Copy link

I don’t have conclusive evidence that this fix will mitigate, but what I observed was some divergent results on LCM-LoRA training runs.

Anw, I think it’s a zero-cost opt-in feature that may produce better results for some users.

I will definitely try the EMA once it is merged and can report on further results.

Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Feb 23, 2024
@sayakpaul
Copy link
Member

@patil-suraj a gentle ping.

@yiyixuxu yiyixuxu added training and removed stale Issues that haven't received updates labels Feb 23, 2024
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Mar 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Issues that haven't received updates training
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants