Open
Description
Is your feature request related to a problem? Please describe.
When reusing a prompt text encoder embeds are recomputed, this can be time consuming for something like T5-XXL with offloading or on CPU.
Text encoder embeds are relatively small, so keeping them in memory is feasible.
import torch
clip_l = torch.randn([1, 77, 768])
t5_xxl = torch.randn([1, 512, 4096])
>>> clip_l.numel() * clip_l.dtype.itemsize
236544
>>> t5_xxl.numel() * t5_xxl.dtype.itemsize
8388608
Describe the solution you'd like.
MVP would be reusing the last text encoder embeds if the prompt hasn't changed, this behaviour is supported in community UIs. Ideally, supports multiple prompts, potentially serializable.