Skip to content

Commit fa6dedf

Browse files
danielhanchenNinoRisteskiErland366versipellisgjyotin305
authored
GGUF saving (#2017)
* Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * fix an import error (#1767) * fix an import error * Delete .gitignore * Update loader.py * Update save.py --------- Co-authored-by: Daniel Han <[email protected]> * SamplingParams * Convert mask to float (#1762) * [Windows Support] Add latest `xformers` wheels to pyproject.toml (#1753) * Add latest xformers * Add a couple of lines to docs * vLLMSamplingParams * Update __init__.py * default num_chunks == -1 * Versioning * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update rl_replacements.py * Update rl_replacements.py * Update pyproject.toml * Update pyproject.toml * Export Model to ollama.com (#1648) * Ollama Export Model to ollama.com Signed-off-by: Jyotin Goel <[email protected]> * Check for model_name Signed-off-by: Jyotin Goel <[email protected]> * subprocess use instead of requests | added check for ollama server Signed-off-by: Jyotin Goel <[email protected]> * create_ollama_model Signed-off-by: Jyotin Goel <[email protected]> * create_ollama_model | fix Signed-off-by: Jyotin Goel <[email protected]> * Push to Ollama Signed-off-by: Jyotin Goel <[email protected]> --------- Signed-off-by: Jyotin Goel <[email protected]> * Update cross_entropy_loss.py * torch_cuda_device * Update utils.py * Update utils.py * Update utils.py * device * device * Update loader.py * Update llama.py * Update README.md * Update llama.py * Update llama.py * Update _utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * __version__ * Update rl.py * Bug fixes * Bug fixes * Update llama.py * Update _utils.py * _wrap_fast_inference * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * SFT dataset prepare * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update llama.py * Update llama.py * Update utils.py * bug fix * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update __init__.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Update _utils.py * Version * versioning * Update _utils.py * Update llama.py * Update llama.py * Bug fixes * FastModel * __doc__ * Update vision.py * Update loader.py * Update loader.py * Update loader.py * version * move use_modelscope to _utils (#1938) * move use_modelscope to _utils * Update _utils.py * Update loader.py --------- Co-authored-by: Daniel Han <[email protected]> * Don't use revision when loading model_config and is_peft=True (#1949) * More syntax warnings (#1944) * move use_modelscope to _utils * fix * Update _utils.py * Update loader.py --------- Co-authored-by: Daniel Han <[email protected]> * Update loader.py * Full finetuning and other fixes * UNSLOTH_ENABLE_FULL_FINETUNING * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * full finetuning * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * max_seq_length * Update rl.py * Update rl.py * Update rl.py * Update pyproject.toml * AutoModelForImageTextToText * Update mapper.py * Update pyproject.toml * Update _utils.py * Update _utils.py * Update _utils.py * Batch samples * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update loader.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update mapper.py * Update vision.py * Temporary patches * Update loader.py * model names * Gemma 3 chat template * Bug fixes * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update rl.py * Update chat_templates.py * Update chat_templates.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Revert * Update _utils.py * forced precision * Autocast * Update vision.py * Update vision.py * Update rl.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl.py * vLLM fixes * constexpr * Update vision.py * Update vision.py * Update vision.py * Update rl.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update save.py * New models * Triton windows update (#1976) * Update pyproject.toml * Update README.md * Update RMS LayerNorm implementation, and list compr. change in chat templates (#1974) * Update RMS LayerNorm implementation with optimizations and testing suite * perf: optimize list comprehension in get_ollama_eos_tokens * Update Zoo * Update llama.py * Update llama.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl_replacements.py * Update vision.py * grpo fix * Update rl_replacements.py * Update vision.py * Update rl_replacements.py * Update vision.py * Update mapper.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update save.py * Update save.py * Update save.py --------- Signed-off-by: Jyotin Goel <[email protected]> Co-authored-by: Nino Risteski <[email protected]> Co-authored-by: Edd <[email protected]> Co-authored-by: Ben <[email protected]> Co-authored-by: Jyotin Goel <[email protected]> Co-authored-by: Kareem <[email protected]> Co-authored-by: Wilson Wu <[email protected]> Co-authored-by: Akshay Behl <[email protected]>
1 parent 029461a commit fa6dedf

File tree

2 files changed

+53
-5
lines changed

2 files changed

+53
-5
lines changed

unsloth/models/vision.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -485,7 +485,7 @@ def post_patch_model(
485485
full_finetuning = os.environ.get("UNSLOTH_ENABLE_FULL_FINETUNING", "0") == "1"
486486

487487
float32_mixed_precision = True
488-
if _get_dtype(model.config.torch_dtype) == torch.bfloat16:
488+
if _get_dtype(model.config.torch_dtype) == torch.bfloat16 and full_finetuning:
489489
# Use bfloat16 precision for full finetuning
490490
float32_mixed_precision = False
491491

unsloth/save.py

+52-4
Original file line numberDiff line numberDiff line change
@@ -2218,12 +2218,60 @@ def unsloth_convert_lora_to_ggml_and_save_locally(
22182218

22192219

22202220
from .models.loader_utils import get_model_name
2221-
from unsloth_zoo.saving_utils import merge_and_overwrite_lora
2221+
from unsloth_zoo.saving_utils import (
2222+
merge_and_overwrite_lora,
2223+
prepare_saving,
2224+
)
22222225
from unsloth_zoo.llama_cpp import (
22232226
install_llama_cpp,
2224-
convert_to_gguf,
2227+
convert_to_gguf as _convert_to_gguf,
22252228
)
22262229

2230+
@torch.inference_mode
2231+
def save_to_gguf_generic(
2232+
model,
2233+
save_directory,
2234+
quantization_type = "Q8_0",
2235+
repo_id = None,
2236+
token = None,
2237+
):
2238+
if token is None and repo_id is not None: token = get_token()
2239+
if repo_id is not None and token is None:
2240+
raise RuntimeError("Unsloth: Please specify a token for uploading!")
2241+
2242+
if not os.path.exists(os.path.join("llama.cpp", "unsloth_convert_hf_to_gguf.py")):
2243+
install_llama_cpp(just_clone_repo = True)
2244+
pass
2245+
2246+
metadata = _convert_to_gguf(
2247+
save_directory,
2248+
print_output = True,
2249+
quantization_type = quantization_type,
2250+
)
2251+
if repo_id is not None:
2252+
prepare_saving(
2253+
model,
2254+
repo_id,
2255+
push_to_hub = True,
2256+
max_shard_size = "50GB",
2257+
private = True,
2258+
token = token,
2259+
)
2260+
2261+
from huggingface_hub import HfApi
2262+
api = HfApi(token = token)
2263+
api.upload_folder(
2264+
folder_path = save_directory,
2265+
repo_id = repo_id,
2266+
repo_type = "model",
2267+
allow_patterns = ["*.gguf"],
2268+
private = True,
2269+
)
2270+
pass
2271+
return metadata
2272+
pass
2273+
2274+
22272275
@torch.inference_mode
22282276
def unsloth_generic_save(
22292277
model,
@@ -2467,8 +2515,8 @@ def patch_saving_functions(model, vision = False):
24672515
# Vision only 1 option
24682516
model.push_to_hub_merged = types.MethodType(unsloth_generic_push_to_hub_merged, model)
24692517
model.save_pretrained_merged = types.MethodType(unsloth_generic_save_pretrained_merged, model)
2470-
model.push_to_hub_gguf = types.MethodType(not_implemented_save, model)
2471-
model.save_pretrained_gguf = types.MethodType(not_implemented_save, model)
2518+
model.push_to_hub_gguf = types.MethodType(save_to_gguf_generic, model)
2519+
model.save_pretrained_gguf = types.MethodType(save_to_gguf_generic, model)
24722520
pass
24732521
return model
24742522
pass

0 commit comments

Comments
 (0)