-
Notifications
You must be signed in to change notification settings - Fork 6.1k
[docs] Caching methods #11625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[docs] Caching methods #11625
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into it @stevhliu! Just some comments regarding the correctness of the explanations and more technical details (which is fine to skip if you think it puts a lot of burden in front of the user)
docs/source/en/optimization/cache.md
Outdated
|
||
## FasterCache | ||
|
||
[FasterCache](https://huggingface.co/papers/2410.19355) computes and caches attention features at every other timestep instead of directly reusing cached features because it can cause flickering or blurry details in the generated video. The features from the skipped step are calculated from the difference between the adjacent cached features. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[FasterCache](https://huggingface.co/papers/2410.19355) computes and caches attention features at every other timestep instead of directly reusing cached features because it can cause flickering or blurry details in the generated video. The features from the skipped step are calculated from the difference between the adjacent cached features. | |
[FasterCache](https://huggingface.co/papers/2410.19355) caches and reuses attention features in a similar manner to PAB, as output differences in successive timesteps of the generation process is small. Additionally, when using classifier-free guidance for sampling (commonly used in most base models), FasterCache may choose to skip the unconditional branch prediction entirely, and estimate it from the conditional branch prediction, if there is a significant redundancy in the predicted latent outputs between successive timesteps. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cc: @sunovivid you might be interested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thanks to @a-r-r-o-w for the suggestions as well. I would be in favor of keeping the technincal details.
Two things:
- Include a table to report timing and memory numbers so that users can know the trade-offs (can happen in a follow-up PR).
- If we have info if the caching methods are generally model-agnostic, having an explicit note about it would be useful.
Thanks for the reviews! Happy to include a table in follow-up if someone can provide me with the timing and memory numbers (or the code to generate that)! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comments I brought up can definitely be included in the follow-up.
Make the caching docs more visible and give a little more context behind the methods.