comfyui gguf model load for diffusers #200

zhangn77 · 2025-01-13T07:29:01Z

for some reasons, I need to run diffusers flux pipeline in comfyui, which means I need to create somes node for flux vae/transformer/text encoder... and make a sampler node to generate the image.
I can achieve this pipeline for a normal flux model. However, when I change the transformer to a gguf-version one, I got this error:
RuntimeError: mat1 and mat2 must have the same dtype, but got BFloat16 and Float
which seems like the gguf model is not processed properly. How can I use it in the pipeline? Below is the code:
`def load_flux_gguf(file_path, transformer_config, dtype, device):
transformer = None
from accelerate import init_empty_weights
from diffusers.loaders.single_file_utils import convert_flux_transformer_checkpoint_to_diffusers
with init_empty_weights():
from diffusers.models.transformers.transformer_flux import FluxTransformer2DModel_Omini_sep_4_cat
config = FluxTransformer2DModel_Omini_sep_4_cat.load_config(transformer_config)
transformer = FluxTransformer2DModel_Omini_sep_4_cat.from_config(config).to(dtype)
expected_state_dict_keys = list(transformer.state_dict().keys())
state_dict, stats = ggml.load_gguf_state_dict(file_path, dtype)
applied, skipped = 0, 0
for param_name, param in state_dict.items():
if param_name not in expected_state_dict_keys:
skipped += 1
continue
applied += 1
hijack_set_module_tensor_simple(transformer, tensor_name=param_name, value=param, device=device)
state_dict[param_name] = None
return transformer, None

def hijack_set_module_tensor_simple(module,tensor_name,device,value):
if "." in tensor_name:
splits = tensor_name.split(".")
for split in splits[:-1]:
module = getattr(module, split)
tensor_name = splits[-1]
old_value = getattr(module, tensor_name)
with torch.no_grad():
if tensor_name in module._buffers:
module._buffers[tensor_name] = value.to(device, non_blocking=True)
elif value is not None:
param_cls = type(module._parameters[tensor_name])
module.parameters[tensor_name] = param_cls(value, requires_grad=False).to(device, non_blocking=True)
unet_path = 'flux1-q4_0.gguf'
transformer_config = 'flux/transformer'
dtype = torch.bfloat16
device = 'cuda'
gguf_transformer, = load_flux_gguf(unet_path, transformer_config, dtype, device)
pipeline = FluxPipeline.from_pretrained(
flux_dir,
transformer=gguf_transformer,
text_encoder=None,
text_encoder_2=None,
tokenizer=None,
tokenizer_2=None,
torch_dtype=torch.bfloat16,
)
`

al-swaiti · 2025-02-11T16:23:43Z

the diffusers start to support llama.cpp (which responsible about gguf quantization) for flux model early , to load gguf models u have to use this code

`import torch

from diffusers import FluxPipeline, FluxTransformer2DModel, GGUFQuantizationConfig

ckpt_path = (
    "https://huggingface.co/city96/FLUX.1-dev-gguf/blob/main/flux1-dev-Q2_K.gguf"
)
transformer = FluxTransformer2DModel.from_single_file(
    ckpt_path,
    quantization_config=GGUFQuantizationConfig(compute_dtype=torch.bfloat16),
    torch_dtype=torch.bfloat16,
)
pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    transformer=transformer,
    torch_dtype=torch.bfloat16,
)
pipe.enable_model_cpu_offload()
prompt = "A cat holding a sign that says hello world"
image = pipe(prompt, generator=torch.manual_seed(0)).images[0]
image.save("flux-gguf.png")`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

comfyui gguf model load for diffusers #200

comfyui gguf model load for diffusers #200

zhangn77 commented Jan 13, 2025

al-swaiti commented Feb 11, 2025 •

edited

Loading

comfyui gguf model load for diffusers #200

comfyui gguf model load for diffusers #200

Comments

zhangn77 commented Jan 13, 2025

al-swaiti commented Feb 11, 2025 • edited Loading

al-swaiti commented Feb 11, 2025 •

edited

Loading