You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Hunyuan] add optimization related sections to the hunyuan dit docs. (huggingface#8402)
* optimizations to the hunyuan dit docs.
* Apply suggestions from code review
Co-authored-by: Steven Liu <[email protected]>
* Update docs/source/en/api/pipelines/hunyuandit.md
Co-authored-by: Steven Liu <[email protected]>
---------
Co-authored-by: Steven Liu <[email protected]>
Copy file name to clipboardExpand all lines: docs/source/en/api/pipelines/hunyuandit.md
+55-1Lines changed: 55 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -28,11 +28,65 @@ HunyuanDiT has the following components:
28
28
* It uses a diffusion transformer as the backbone
29
29
* It combines two text encoders, a bilingual CLIP and a multilingual T5 encoder
30
30
31
+
<Tip>
31
32
32
-
## Memory optimization
33
+
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.md) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
34
+
35
+
</Tip>
36
+
37
+
## Optimization
38
+
39
+
You can optimize the pipeline's runtime and memory consumption with torch.compile and feed-forward chunking. To learn about other optimization methods, check out the [Speed up inference](../../optimization/fp16) and [Reduce memory usage](../../optimization/memory) guides.
40
+
41
+
### Inference
42
+
43
+
Use [`torch.compile`](https://huggingface.co/docs/diffusers/main/en/tutorials/fast_diffusion#torchcompile) to reduce the inference latency.
The [benchmark](https://gist.github.com/sayakpaul/29d3a14905cfcbf611fe71ebd22e9b23) results on a 80GB A100 machine are:
73
+
74
+
```bash
75
+
With torch.compile(): Average inference time: 12.470 seconds.
76
+
Without torch.compile(): Average inference time: 20.570 seconds.
77
+
```
78
+
79
+
### Memory optimization
33
80
34
81
By loading the T5 text encoder in 8 bits, you can run the pipeline in just under 6 GBs of GPU VRAM. Refer to [this script](https://gist.github.com/sayakpaul/3154605f6af05b98a41081aaba5ca43e) for details.
35
82
83
+
Furthermore, you can use the [`~HunyuanDiT2DModel.enable_forward_chunking`] method to reduce memory usage. Feed-forward chunking runs the feed-forward layers in a transformer block in a loop instead of all at once. This gives you a trade-off between memory consumption and inference runtime.
0 commit comments