koboldcpp fails to load draft models for Gemma4
Attempting to load draft model for speculative decoding. It will be fully offloaded if possible. Vocab must match the main model.
llama_model_loader: loaded meta data with 50 key-value pairs and 49 tensors from C:/Users/kat/AI/Models/Gemma4/gemma-4-26b-A4B-it-assistant-QAT-Q4_0.gguf (version GGUF V3 (latest))
print_info: file format = GGUF V3 (latest)
print_info: file size = 291.21 MiB (5.82 BPW)
llama_model_load: error loading model: unknown model architecture: 'gemma4-assistant'
llama_model_load_from_file_impl: failed to load model
llama_init_from_model: model cannot be NULL
Error: failed to load speculative decoding draft model 'C:/Users/kat/AI/Models/Gemma4/gemma-4-26b-A4B-it-assistant-QAT-Q4_0.gguf'
Speculative Decoding will not be used!
Starting model warm up, please wait a moment...
Load Text Model OK: True
Additional Information:
OS: Windows 11
CPU: ryzen 2700x
gpu: nvidia rtx 4060ti
KoboldCPP version: 1.114.1
koboldcpp fails to load draft models for Gemma4
Additional Information:
OS: Windows 11
CPU: ryzen 2700x
gpu: nvidia rtx 4060ti
KoboldCPP version: 1.114.1