Skip to content

[Refactor] How attention is set in 3D UNet blocks #6893

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

DN6
Copy link
Collaborator

@DN6 DN6 commented Feb 7, 2024

What does this PR do?

There's some confusion around how num_attention_heads is being used in the 3D UNet + Block. This PR attempts to fix the issue by correcting the values used for num_attention_heads

Related:
#6872
#6873

Fixes # (issue)

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

# If `num_attention_heads` is not defined (which is the case for most models)
# it will default to `attention_head_dim`. This looks weird upon first reading it and it is.
# The reason for this behavior is to correct for incorrectly named variables that were introduced
# when this library was created. The incorrect naming was only discovered much later in https://github.com/huggingface/diffusers/issues/2011#issuecomment-1547958131
# Changing `attention_head_dim` to `num_attention_heads` for 40,000+ configurations is too backwards breaking
# which is why we correct for the naming here.
num_attention_heads = num_attention_heads or attention_head_dim
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also remove or update the comment above, assuming num_attention_heads is replicated in the hub configs. Do we know how many models like https://huggingface.co/ali-vilab/i2vgen-xl/blob/6c4e9e70bdcd36eb59d98d2b583adea0813ea8de/unet/config.json#L21 do we need to update?

Will we need to live forever with duplicated property names in the hub?

Copy link
Collaborator

@yiyixuxu yiyixuxu Feb 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just this one model and it was just out a few days ago
IMO it's an edge case we don't mind breaking - it will only affect people who want to use the local copy, no?

Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea! but I think it's not backward-compatible for CrossAttnDownBlock3D, CrossAttnUpBlock3D, UNetMidBlock3DCrossAttn, get_up_block and get_down_block

Do you have a good idea to get around that?

num_attention_heads,
out_channels // num_attention_heads,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CrossAttnDownBlock3D is part of our public API and this is a breaking change, no?

Copy link
Collaborator Author

@DN6 DN6 Feb 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, why isn't it backwards compatible? None of the args in the class init are being changed right?

The 3D blocks and the 3D UNet are only used with the Text to Video Synth and I2VGenXL model in the library.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with get_up_block and get_down_block. We're only changing the number being passed in to the num_attention_heads argument.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the meaning of an argument is breaking, no?

for example,

CrossAttnDownBlock3D(... num_attention_heads = 64) is currently expected to create attentions with head_dim=64; with this code change, it will create attentions with 64 heads instead

Copy link
Collaborator

@yiyixuxu yiyixuxu Feb 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 3D blocks and the 3D UNet are only used with the Text to Video Synth and I2VGenXL model in the library.

yes but it is our public API and we have to assume it's been used outside of the library

Copy link
Collaborator Author

@DN6 DN6 Feb 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be a relatively safe change. Searching Github public repos for an import of these blocks from the public API doesn't return any results. I actually don't think you can import it directly from diffusers

https://github.com/search?q=%22from+diffusers.models+import+CrossAttnDownBlock3D%22+language:Python+&type=code
https://github.com/search?q=%22from+diffusers+import+CrossAttnUpBlock3D%22+language:Python+&type=code
https://github.com/search?q=%22from+diffusers+import+UNetMidBlock3DCrossAttn%22+language:Python+&type=code

It looks like more often than not people redefine the blocks themselves
https://github.com/search?q=%22CrossAttnDownBlock3D%22+language:Python+&type=code&p=5

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

@yiyixuxu yiyixuxu Feb 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @pcuenca and @patrickvonplaten here.
I would like to hear your thoughts about when we can make breaking changes (other than v1.0.0).

Personally, I think we should only make breaking changes when we don't have another choice, or we know super confidently it is an edge case (e.g., if we just added these blocks yesterday, I would think it's ok to break here).
I think in this case,
(1) we do not have to make these changes: we want to make these changes to make our code more readable and easier for contributors to contribute, but it is not a must and this is not the only way to go
(2) we don't really have a way to find out about its usage outside github

also, I think a break change is somewhat more acceptable if we are able to throw an error. In this case, it will just be breaking silently so IMO it is worse

But I'm curious about your thoughts on this and I'm cool with it if you all feel strongly about making this change here :)

Copy link
Contributor

@patrickvonplaten patrickvonplaten Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python -c "from diffusers import CrossAttnDownBlock3D"

doesn't work, so strictly speaking CrossAttnDownBlock3D is not considered part of the public API. Also I don't think it's used that much so IMO it's ok to change it here (while making sure though that this might lead to breaking changes depending on how CrossAttnDownBlock3D is imported.

@@ -132,13 +132,19 @@ def __init__(
"At the moment it is not possible to define the number of attention heads via `num_attention_heads` because of a naming issue as described in https://github.com/huggingface/diffusers/issues/2011#issuecomment-1547958131. Passing `num_attention_heads` will only be supported in diffusers v0.19."
)

if isinstance(attention_head_dim, int):
num_attention_heads = [out_channels // attention_head_dim for out_channels in block_out_channels]
Copy link
Contributor

@patrickvonplaten patrickvonplaten Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea.

Are we sure though that this is always correct? Does out_channels always represent the hidden_dim of the attention layer?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the 3D UNets this is safe. There are a limited number of blocks used with this model CrossAttnDownBlock3D, CrossAttnUpBlock3DandUNetMidBlock3DCrossAttnand they all configurenum_attention_heads` based on the out_channels.

@@ -132,13 +132,19 @@ def __init__(
"At the moment it is not possible to define the number of attention heads via `num_attention_heads` because of a naming issue as described in https://github.com/huggingface/diffusers/issues/2011#issuecomment-1547958131. Passing `num_attention_heads` will only be supported in diffusers v0.19."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to try to remove this statement

Comment on lines -497 to +498
out_channels // num_attention_heads,
num_attention_heads,
out_channels // num_attention_heads,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use key word arguments here when correcting it.

Copy link
Contributor

github-actions bot commented Mar 9, 2024

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Mar 9, 2024
@yiyixuxu yiyixuxu removed the stale Issues that haven't received updates label Mar 9, 2024
@yiyixuxu
Copy link
Collaborator

yiyixuxu commented Mar 9, 2024

@DN6 let's finish up this soon:)

Copy link
Contributor

github-actions bot commented Apr 3, 2024

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Apr 3, 2024
@DN6 DN6 removed the stale Issues that haven't received updates label Apr 8, 2024
@DN6 DN6 marked this pull request as ready for review April 8, 2024 05:35
@DN6
Copy link
Collaborator Author

DN6 commented Apr 10, 2024

@yiyixuxu Anything specific you wanted to address in this current PR?

@DN6
Copy link
Collaborator Author

DN6 commented Apr 10, 2024

Oh wait. Nvm I haven't addressed those initial comments. Will wrap this up.

Copy link
Contributor

github-actions bot commented May 5, 2024

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label May 5, 2024
@DN6 DN6 removed the stale Issues that haven't received updates label May 6, 2024
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Sep 14, 2024
@yiyixuxu yiyixuxu added refactor and removed stale Issues that haven't received updates labels Sep 17, 2024
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Oct 12, 2024
@yiyixuxu yiyixuxu removed the stale Issues that haven't received updates label Oct 15, 2024
Copy link
Contributor

github-actions bot commented Nov 8, 2024

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactor stale Issues that haven't received updates
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants