Skip to content

Commit 2b5d81d

Browse files
danielhancheneverythingisc00lSethHWeidmanNinoRisteskiErland366
authored
Bug fixes (#1951)
* Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update _utils.py * Update llama.py * Update _utils.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update rl_replacements.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * GRPO optimized * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Selective Log softmax * Fix GRPO bsz * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Fix TRL * Metrics GRPO * Update rl_replacements.py * Update rl_replacements.py * No compile * Update rl.py * Remove docs * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * llama-quantize on WINDOWS WSL error fix - edit save.py (gguf saving breaks) (#1649) * edit save.py to fix gguf saving breaks. * add check for .exe or not exe file extension for linux and windows * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * unsloth_num_chunks * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py (#1754) Fix typo in comment: know -> now. This was printed when running the Llama3.1_(8B)-GRPO.ipynb example notebook, so I'd expect others to run into it as well. * Optional logits * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * fix an import error (#1767) * fix an import error * Delete .gitignore * Update loader.py * Update save.py --------- Co-authored-by: Daniel Han <[email protected]> * SamplingParams * Convert mask to float (#1762) * [Windows Support] Add latest `xformers` wheels to pyproject.toml (#1753) * Add latest xformers * Add a couple of lines to docs * vLLMSamplingParams * Update __init__.py * default num_chunks == -1 * Versioning * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update rl_replacements.py * Update rl_replacements.py * Update pyproject.toml * Update pyproject.toml * Export Model to ollama.com (#1648) * Ollama Export Model to ollama.com Signed-off-by: Jyotin Goel <[email protected]> * Check for model_name Signed-off-by: Jyotin Goel <[email protected]> * subprocess use instead of requests | added check for ollama server Signed-off-by: Jyotin Goel <[email protected]> * create_ollama_model Signed-off-by: Jyotin Goel <[email protected]> * create_ollama_model | fix Signed-off-by: Jyotin Goel <[email protected]> * Push to Ollama Signed-off-by: Jyotin Goel <[email protected]> --------- Signed-off-by: Jyotin Goel <[email protected]> * Update cross_entropy_loss.py * torch_cuda_device * Update utils.py * Update utils.py * Update utils.py * device * device * Update loader.py * Update llama.py * Update README.md * Update llama.py * Update llama.py * Update _utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * __version__ * Update rl.py * Bug fixes * Bug fixes * Update llama.py * Update _utils.py * _wrap_fast_inference * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * SFT dataset prepare * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update llama.py * Update llama.py * Update utils.py * bug fix * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update __init__.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Update _utils.py * Version * versioning * Update _utils.py * Update llama.py * Update llama.py * Bug fixes * FastModel * __doc__ * Update vision.py * Update loader.py * Update loader.py * Update loader.py * version --------- Signed-off-by: Jyotin Goel <[email protected]> Co-authored-by: Gennadii Manzhos <[email protected]> Co-authored-by: Seth Weidman <[email protected]> Co-authored-by: Nino Risteski <[email protected]> Co-authored-by: Edd <[email protected]> Co-authored-by: Ben <[email protected]> Co-authored-by: Jyotin Goel <[email protected]>
1 parent 8d7662e commit 2b5d81d

File tree

8 files changed

+196
-134
lines changed

8 files changed

+196
-134
lines changed

pyproject.toml

+2-2
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ triton = [
4040
]
4141

4242
huggingface = [
43-
"unsloth_zoo>=2025.3.7",
43+
"unsloth_zoo>=2025.3.8",
4444
"packaging",
4545
"tyro",
4646
"transformers>=4.46.1,!=4.47.0",
@@ -354,7 +354,7 @@ colab-ampere-torch220 = [
354354
"flash-attn>=2.6.3",
355355
]
356356
colab-new = [
357-
"unsloth_zoo>=2025.3.7",
357+
"unsloth_zoo>=2025.3.8",
358358
"packaging",
359359
"tyro",
360360
"transformers>=4.46.1,!=4.47.0",

unsloth/__init__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -198,7 +198,7 @@ def is_bf16_supported(): return SUPPORTS_BFLOAT16
198198
# Check for unsloth_zoo
199199
try:
200200
unsloth_zoo_version = importlib_version("unsloth_zoo")
201-
if Version(unsloth_zoo_version) < Version("2025.3.7"):
201+
if Version(unsloth_zoo_version) < Version("2025.3.8"):
202202
try:
203203
os.system("pip install --upgrade --no-cache-dir --no-deps unsloth_zoo")
204204
except:

unsloth/models/__init__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
# limitations under the License.
1414

1515
from .llama import FastLlamaModel
16-
from .loader import FastLanguageModel, FastVisionModel
16+
from .loader import FastLanguageModel, FastVisionModel, FastTextModel, FastModel
1717
from .mistral import FastMistralModel
1818
from .qwen2 import FastQwen2Model
1919
from .granite import FastGraniteModel

unsloth/models/_utils.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License.
1414

15-
__version__ = "2025.3.8"
15+
__version__ = "2025.3.9"
1616

1717
__all__ = [
1818
"SUPPORTS_BFLOAT16",

unsloth/models/llama.py

+15-3
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@ def original_apply_o(self, X):
9191
pass
9292

9393
from math import sqrt as math_sqrt
94-
KV_CACHE_INCREMENT = 256 # KV Cache update size
94+
KV_CACHE_INCREMENT = 512 # KV Cache update size
9595
torch_nn_functional_softmax = torch.nn.functional.softmax
9696
# SDPA has GQA internally
9797
SDPA_HAS_GQA = "enable_gqa" in scaled_dot_product_attention.__doc__
@@ -1656,6 +1656,13 @@ def from_pretrained(
16561656
"Are you certain you want to do remote code execution?"
16571657
)
16581658
pass
1659+
if fast_inference:
1660+
import platform
1661+
if platform.system().lower() == 'windows':
1662+
print("Unsloth: vLLM does not work in Windows! Will use Unsloth inference!")
1663+
fast_inference = False
1664+
pass
1665+
16591666
if token is None: token = get_token()
16601667
if model_patcher is None: model_patcher = FastLlamaModel
16611668
SUPPORTS_BFLOAT16 = is_bfloat16_supported()
@@ -1966,12 +1973,17 @@ def from_pretrained(
19661973
for layer in model.model.layers:
19671974
layer.self_attn.rotary_emb = rotary_emb
19681975
pass
1969-
1976+
1977+
# Add for_inference and for_training
1978+
model.for_training = functools.partial(FastLlamaModel.for_training, model)
1979+
model.for_inference = functools.partial(FastLlamaModel.for_inference, model)
1980+
19701981
# Patch generate
19711982
if model.generate.__name__ != "unsloth_fast_generate":
19721983
model._old_generate = model.generate
19731984
unsloth_fast_generate.__doc__ = model._old_generate.__doc__
19741985
model.generate = types.MethodType(unsloth_fast_generate, model)
1986+
pass
19751987
return model, tokenizer
19761988
pass
19771989

@@ -2404,7 +2416,7 @@ def get_peft_model(
24042416
# Add for_inference and for_training
24052417
model.for_training = functools.partial(FastLlamaModel.for_training, model)
24062418
model.for_inference = functools.partial(FastLlamaModel.for_inference, model)
2407-
2419+
24082420
# Patch generate
24092421
if model.generate.__name__ != "unsloth_fast_generate":
24102422
model._old_generate = model.generate

unsloth/models/loader.py

+28-11
Original file line numberDiff line numberDiff line change
@@ -383,10 +383,13 @@ def from_pretrained(
383383
patch_loss_functions,
384384
post_patch_loss_function,
385385
)
386-
from .vision import FastBaseVisionModel
387-
386+
from .vision import FastBaseModel
387+
from transformers import (
388+
AutoModelForVision2Seq,
389+
AutoModelForCausalLM,
390+
)
388391

389-
class FastVisionModel(FastBaseVisionModel):
392+
class FastModel(FastBaseModel):
390393
@staticmethod
391394
def from_pretrained(
392395
model_name = "unsloth/Llama-3.2-11B-Vision-Instruct-bnb-4bit",
@@ -413,7 +416,7 @@ def from_pretrained(
413416
patch_compiling_bitsandbytes()
414417
if use_gradient_checkpointing == "unsloth":
415418
patch_unsloth_smart_gradient_checkpointing(dtype = dtype)
416-
419+
417420
old_model_name = model_name
418421
if not use_exact_model_name:
419422
model_name = get_model_name(model_name, load_in_4bit)
@@ -427,7 +430,7 @@ def from_pretrained(
427430
from huggingface_hub.utils import disable_progress_bars, enable_progress_bars, are_progress_bars_disabled
428431
was_disabled = are_progress_bars_disabled()
429432
disable_progress_bars()
430-
433+
431434
autoconfig_error = None
432435
peft_error = None
433436
try:
@@ -458,7 +461,7 @@ def from_pretrained(
458461

459462
# Old transformers versions check
460463
both_exist = (is_model and is_peft) and not SUPPORTS_LLAMA32
461-
464+
462465
# New transformers need to check manually.
463466
if SUPPORTS_LLAMA32:
464467
# Check if folder exists locally
@@ -515,9 +518,12 @@ def from_pretrained(
515518
if not was_disabled: enable_progress_bars()
516519

517520
do_logging = os.environ.get("UNSLOTH_ENABLE_LOGGING", "0") == "1"
518-
redirector = sys.stdout if do_logging else open(os.devnull, "w")
521+
if do_logging:
522+
redirector = contextlib.nullcontext()
523+
else:
524+
redirector = contextlib.redirect_stdout(open(os.devnull, "w"))
519525

520-
with contextlib.redirect_stdout(redirector):
526+
with redirector:
521527
patch_loss_functions(torch_compile = False)
522528
model_types = unsloth_compile_transformers(
523529
model_name = model_name,
@@ -547,7 +553,6 @@ def from_pretrained(
547553
return_logits = return_logits,
548554
)
549555
pass
550-
if do_logging: redirector.close()
551556

552557
# Check if this is local model since the tokenizer gets overwritten
553558
if os.path.exists(os.path.join(old_model_name, "tokenizer_config.json")) and \
@@ -559,7 +564,12 @@ def from_pretrained(
559564
tokenizer_name = None
560565
pass
561566

562-
model, tokenizer = FastBaseVisionModel.from_pretrained(
567+
# Check if VLM
568+
is_vlm = (x.endswith("ForConditionalGeneration") for x in model_config.architectures)
569+
is_vlm = is_vlm or hasattr(model_config, "vision_config")
570+
auto_model = AutoModelForVision2Seq if is_vlm else AutoModelForCausalLM
571+
572+
model, tokenizer = FastBaseModel.from_pretrained(
563573
model_name = model_name,
564574
max_seq_length = max_seq_length,
565575
dtype = _get_dtype(dtype),
@@ -570,6 +580,7 @@ def from_pretrained(
570580
revision = revision if not is_peft else None,
571581
model_types = model_types,
572582
tokenizer_name = tokenizer_name,
583+
auto_model = auto_model,
573584
*args, **kwargs,
574585
)
575586

@@ -617,8 +628,14 @@ def from_pretrained(
617628
trust_remote_code = trust_remote_code,
618629
)
619630
# Patch it as well!
620-
model = FastBaseVisionModel.patch_peft_model(model, use_gradient_checkpointing)
631+
model = FastBaseModel.patch_peft_model(model, use_gradient_checkpointing)
621632
pass
622633
return model, tokenizer
623634
pass
624635
pass
636+
637+
class FastVisionModel(FastModel):
638+
pass
639+
640+
class FastTextModel(FastModel):
641+
pass

unsloth/models/mapper.py

+15
Original file line numberDiff line numberDiff line change
@@ -611,6 +611,21 @@
611611
"open-thoughts/OpenThinker-7B",
612612
"unsloth/OpenThinker-7B-bnb-4bit",
613613
),
614+
"unsloth/granite-3.2-2b-instruct-unsloth-bnb-4bit" : (
615+
"unsloth/granite-3.2-2b-instruct",
616+
"ibm-granite/granite-3.2-2b-instruct",
617+
"unsloth/granite-3.2-2b-instruct-bnb-4bit",
618+
),
619+
"unsloth/granite-3.2-8b-instruct-unsloth-bnb-4bit" : (
620+
"unsloth/granite-3.2-8b-instruct",
621+
"ibm-granite/granite-3.2-8b-instruct",
622+
"unsloth/granite-3.2-8b-instruct-bnb-4bit",
623+
),
624+
"unsloth/QwQ-32B-unsloth-bnb-4bit" : (
625+
"unsloth/QwQ-32B",
626+
"Qwen/QwQ-32B",
627+
"unsloth/QwQ-32B-bnb-4bit",
628+
),
614629
}
615630

616631
INT_TO_FLOAT_MAPPER = {}

0 commit comments

Comments
 (0)