The confusion surrounding bnb_4bit_compute_dtype, torch_dtype, and prepare_model_for_kbit_training. #1516
Unanswered
xiaobingbuhuitou
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I want to implement QLoRA fine-tuning for a model with
dtype=Float32
based on PEFT. When I load the base model usingfrom_pretrained("PATH", BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16 ))
, without setting torch_dtype, the model's dtype changes toFloat16,
and the obtained last_hidden_state also becomesFloat16
. However, when I settorch_dtype=Float32
, both the model's dtype and last_hidden_state remainFloat32
. But when I wrap the quantized model withprepare_model_for_kbit_training()
, everything changes back toFloat32
. I would like to know if using theprepare_model_for_kbit_training()
function causesbnb_4bit_compute_dtype
andtorch_dtype
to become ineffective. Additionally, I would like to ask when it is necessary to setprepare_model_for_kbit_training()
. Furthermore, what determines the impact on the base model's and last_hidden_state's data types? Thank you.Beta Was this translation helpful? Give feedback.
All reactions