Skip to content

Cache text encoder embeds in pipelines #10078

Open
@hlky

Description

@hlky

Is your feature request related to a problem? Please describe.

When reusing a prompt text encoder embeds are recomputed, this can be time consuming for something like T5-XXL with offloading or on CPU.

Text encoder embeds are relatively small, so keeping them in memory is feasible.

import torch

clip_l = torch.randn([1, 77, 768])
t5_xxl = torch.randn([1, 512, 4096])
>>> clip_l.numel() * clip_l.dtype.itemsize
236544
>>> t5_xxl.numel() * t5_xxl.dtype.itemsize
8388608

Describe the solution you'd like.

MVP would be reusing the last text encoder embeds if the prompt hasn't changed, this behaviour is supported in community UIs. Ideally, supports multiple prompts, potentially serializable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions