Reusing the same pipeline (FluxPipeline) increase the inference duration

### Describe the bug

So I create the pipe and use it to generate multiple image with same settings. During first inference it take 8 min, next 30 min. VRAM usage remains the same.

Tested on 8 GB + 8 GB

P.S. I have used AuraFlow, Sana, Hunyuan, LTX, Cog, and several other pipeline but didn't encounter this issue with any of them.

### Reproduction

```
import torch
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, FluxTransformer2DModel, FluxPipeline
from huggingface_hub import hf_hub_download
from transformers import T5EncoderModel

bfl_repo = "black-forest-labs/FLUX.1-dev"
dtype = torch.bfloat16
quantization_config = DiffusersBitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16)

transformer_4bit = FluxTransformer2DModel.from_pretrained(
    bfl_repo,
    subfolder="transformer",
    quantization_config=quantization_config,
    torch_dtype=torch.bfloat16,
)
text_encoder_2 = T5EncoderModel.from_pretrained(
    bfl_repo, 
    subfolder="text_encoder_2",
    quantization_config=quantization_config,
    torch_dtype=dtype
)
pipe = FluxPipeline.from_pretrained(
    bfl_repo, 
    transformer=None, 
    text_encoder_2=None, 
    torch_dtype=dtype
)
pipe.transformer = transformer_4bit
pipe.text_encoder_2 = text_encoder_2

# https://civitai.com/models/1111989/majicflus-beauty
pipe.load_lora_weights(
    "./models/lora/flux_dev/majicbeauty1.safetensors", 
    adapter_name="majicbeauty1"
)

pipe.set_adapters("majicbeauty1", adapter_weights=0.8)
pipe.enable_model_cpu_offload()
pipe.vae.enable_tiling()
pipe.vae.enable_slicing()

prompt = "Photograph capturing a woman seated in a car, looking straight ahead. Her face is partially obscured, making her expression hard to read, adding an air of mystery. Natural light filters through the car window, casting subtle reflections and shadows on her face and the interior. The colors are muted yet realistic, with a slight grain that evokes a 1970s film quality. The scene feels intimate and contemplative, capturing a quiet, introspective moment, mj"
image = pipe(
    prompt=prompt,
    width=1072,
    height=1920,
    max_sequence_length=512,
    num_inference_steps=40,
    guidance_scale=50,
    generator=torch.Generator().manual_seed(1349562290),
).images[0]
image.save("out_majicbeauty5.png")
torch.cuda.empty_cache()
image = pipe(
    prompt=prompt,
    width=1072,
    height=1920,
    max_sequence_length=512,
    num_inference_steps=50,
    guidance_scale=40,
    generator=torch.Generator().manual_seed(1349562290),
).images[0]
image.save("out_majicbeauty6.png")
```

### Logs

```shell
Fetching 3 files: 100%|█████████████████████████████████████████████████████| 3/3 [00:00<?, ?it/s]
`low_cpu_mem_usage` was None, now default to True since model is quantized.
Downloading shards: 100%|██████████████████████████████████████████| 2/2 [00:00<00:00, 440.05it/s]
Loading checkpoint shards: 100%|████████████████████████████████████| 2/2 [00:27<00:00, 13.90s/it]
Loading pipeline components...:   0%|                                       | 0/5 [00:00<?, ?it/s]You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Loading pipeline components...: 100%|███████████████████████████████| 5/5 [00:00<00:00,  5.12it/s]
Token indices sequence length is longer than the specified maximum sequence length for this model (95 > 77). Running this sequence through the model will result in indexing errors
The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: ['. the scene feels intimate and contemplative, capturing a quiet, introspective moment, mj']
100%|█████████████████████████████████████████████████████████████| 40/40 [08:10<00:00, 12.25s/it]
The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: ['. the scene feels intimate and contemplative, capturing a quiet, introspective moment, mj']
  4%|██▍                                                           | 2/50 [01:52<43:27, 54.32s/it]
```

### System Info

- 🤗 Diffusers version: 0.33.0.dev0
- Platform: Windows-10-10.0.26100-SP0
- Running on Google Colab?: No
- Python version: 3.10.11
- PyTorch version (GPU?): 2.5.1+cu124 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 0.27.1
- Transformers version: 4.48.1
- Accelerate version: 1.4.0.dev0
- PEFT version: 0.14.1.dev0
- Bitsandbytes version: 0.45.1
- Safetensors version: 0.5.2
- xFormers version: not installed
- Accelerator: NVIDIA GeForce RTX 4060 Laptop GPU, 8188 MiB
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>

### Who can help?

@yiyixuxu @DN6 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reusing the same pipeline (FluxPipeline) increase the inference duration #10705

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reusing the same pipeline (FluxPipeline) increase the inference duration #10705

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions