Open
Description
Hello,
I'm currently trying to create image previews with SDXL. This works! However, the image output are very noisy. A very long time ago I found a solution to this for sd1.5 but unfortunately it has been lost to time.
How would I go about denoising these images so they are a little more coherent to a human viewer? I know the first couple of iterations are always going to be very noisy, but eventually it should be possible to convert this noise into a blurry image that a human could understand.
import time
from diffusers import StableDiffusionXLPipeline
import torch
pipe = StableDiffusionXLPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
).to("cuda")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
def callback(pipe, step_index, timestep, callback_kwargs):
latents = callback_kwargs.get("latents")
start_time = time.time()
with torch.no_grad():
pipe.upcast_vae()
latents = latents.to(
next(iter(pipe.vae.post_quant_conv.parameters())).dtype)
images = pipe.vae.decode(
latents / pipe.vae.config.scaling_factor, return_dict=False)[0]
images = pipe.image_processor.postprocess(images, output_type='pil')
images[0].save(f"./imgs/{step_index}.png")
end_time = time.time()
print(f"Time taken to generate image: {end_time - start_time} seconds")
return callback_kwargs
pipe(prompt=prompt, callback_on_step_end=callback)