Skip to content

Freeing GPU memory after torch.compile StableDiffusionXLPipeline UNet #9530

Open
@To-jak

Description

@To-jak

While exploring optimizations listed in the documentation, I find myself unable to free GPU memory after using torch.compile on a StableDiffusionXLPipeline UNet.

from diffusers import StableDiffusionXLPipeline

pipe = StableDiffusionXLPipeline.from_pretrained(
    'stabilityai/stable-diffusion-xl-base-1.0',
    torch_dtype=torch.float16,
    variant="fp16",
    use_safetensors=True
).to('cuda')

# Compile UNet
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)

generator = torch.Generator(device="cuda").manual_seed(42)
prompt = "a photo of an astronaut riding a horse on mars"

image = pipe(prompt=prompt, num_inference_steps=20, generator=generator).images[0]

del pipe

gc.collect()
torch._dynamo.reset()
torch.cuda.empty_cache()
torch.cuda.synchronize()

# GPU memory is still in use, but it's not the case when we do not compile the pipeline unet.

It can sometimes be useful to free the GPU memory, especially if you want to load and compile another pipeline checkpoint to perform another large number of generations.

I made a code reproduction in collab for testing.

Am I missing something? Could it be a memory leak on the compilation backend side, in which case it might be better to turn to PyTorch to discuss about this?

System Info

python: 3.10.12
diffusers: 0.30.3
torch: 2.4.1+cu121
Running on Google Colab?: Yes

Metadata

Metadata

Assignees

No one assigned

    Labels

    staleIssues that haven't received updatestorch-compile

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions