Skip to content

Conversation

rockerBOO
Copy link
Contributor

@rockerBOO rockerBOO commented May 8, 2025

Partitioned VAE allows the VAE to handle larger resolution images, and also allows us to adjust the packing to have larger patches. This might help with high resolution (2k-4k images) to be processed.

https://github.com/zhang0jhon/diffusion-4k
https://arxiv.org/abs/2503.18352

config toml:

partitioned_vae = true

CLI:

--partitioned_vae

In conjunction they also use Wavelet loss to capture the finer detail. This is just the partitioned VAE aspects.

Also applies in the sampling, where I refactored it so it makes a noisy latent like tensor and then makes the appropriate patches like we do in training. Instead of going straight to the patched dimensions and makes it a little clearer how it works, while allowing us to handle these differently.

Note Only supports Flux model at this time. Requires modifying the AutoencoderKL Decoder but could work with any model. Only really makes sense in larger models that can support 2k-4k images.

@rockerBOO rockerBOO marked this pull request as draft May 9, 2025 02:51
@rockerBOO
Copy link
Contributor Author

rockerBOO commented May 12, 2025

This is probably still very experimental as this implementation has a few parts and results in an image that seems a bit more "zoomed in". Which i don't think is ideal to it being implemented correctly.

It also requires a similar implementation for inference, so, hard to have others test without implementing modified inference tools.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant