Cache text encoder embeds in pipelines

**Is your feature request related to a problem? Please describe.**

When reusing a prompt text encoder embeds are recomputed, this can be time consuming for something like T5-XXL with offloading or on CPU.

Text encoder embeds are relatively small, so keeping them in memory is feasible.
```python
import torch

clip_l = torch.randn([1, 77, 768])
t5_xxl = torch.randn([1, 512, 4096])
>>> clip_l.numel() * clip_l.dtype.itemsize
236544
>>> t5_xxl.numel() * t5_xxl.dtype.itemsize
8388608
```

**Describe the solution you'd like.**

MVP would be reusing the last text encoder embeds if the prompt hasn't changed, this behaviour is supported in community UIs. Ideally,  supports multiple prompts, potentially serializable.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cache text encoder embeds in pipelines #10078

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cache text encoder embeds in pipelines #10078

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions