Skip to content

Conversation

@BenjaminBossan
Copy link
Member

@BenjaminBossan BenjaminBossan commented Nov 4, 2025

Resolves #2889

Description

The reported bug is this: When the base model is quantized with 4bit bitsandbytes, the adapter weights would be cast to float32, even if autocast_adapter_dtype=False was passed. This is because the dtype of the base layer was not correctly determined in that case. This PR now correctly determines the dtype.

While working on this, I noticed that the peft_model.add_adapter method was lacking the option to disable autocasting. This was added now and the tests cover it as well. I also updated some of the corresponding docstrings.

Tangential changes

An unrelated issue I noticed is when I was debugging: At one point, OSF calls if not hasattr(module, "osf_svd_params"). This would error when the module was a ModulesToSaveWrapper because ModulesToSaveWrapper._hasattr_wrapped was not taking into account the case that there is no active adapter. This is now fixed too.

Moreover, OSF implemented its own _cast_adapter_dtype. This would basically bypass upcasting to float32 of the OSF weights if the base model is loaded in lower precision. However, unless the user explicitly passes autocast_adapter_dtype=False, the default in PEFT is to upcast the adapter weights to float32. With the changes to this PR, upcasting is now done. To make this work with the forward pass, the x is cast to the dtype of the weight. We assume that the output dtype should be the same as the original dtype of x.

TODOs

There is still an issue left with 8bit bnb weights. They don't have a compute dtype, so at a layer level, it is not possible to determine what the dtype of the PEFT adapter should be (of course, it cannot be int8). Therefore, the corresponding tests for 8bit bnb are x-failing for now. One possible solution could be to pass down the dtype of the base model (if any) and use that as a fallback. This could be implemented in a later PR.

Resolves huggingface#2889

The reported bug is this: When the base model is quantized with 4bit
bitsandbytes, the adapter weights would be cast to float32, even if
autocast_adapter_dtype=False was passed. This is because the dtype of
the base layer was not correctly determined in that case. This PR now
correctly determines the dtype.

While working on this, I noticed that the peft_model.add_adapter method
was lacking the option to disable autocasting. This was added now and
the tests cover it as well.

TODOs

For LNTuning and OSF, I found that the dtype is not correctly being
applied when it is float16 or bfloat16. As I didn't want to blow up this
PR even more, I decided to skip those methods for now and leave the fix
for those for another time.

Moreover, there is still an issue left with 8bit bnb weights. They don't
have a compute dtype, so at a layer level, it is not possible to
determine what the dtype of the PEFT adapter should be (of course, it
cannot be int8). Therefore, the corresponding tests for 8bit bnb are
x-failing for now. One possible solution could be to pass down the dtype
of the base model (if any) and use that as a fallback. This could be
implemented in a later PR.
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@BenjaminBossan
Copy link
Member Author

@NikhilNayak-debug this PR contains some updates to OSF. Could you please check if they make sense? Check the PR description above for the reason of the changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

autocast_adapter_dtype=False doesn't work when the model is quantized

2 participants