-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
can't infer with a "exclude_lm_head" model #1166
Comments
If you are using An alternative approach is to have both the last hidden states and the logits as outputs in the ONNX model. You can achieve that by using |
Thank you for your reply. But there is no |
You can run the model builder from source to access that option. Here's how you can do that using your provided command. # Clone the repo
$ git clone https://github.com/microsoft/onnxruntime-genai
# Navigate to the model builder
$ cd onnxruntime-genai/src/python/py/models/
# Run your command with `include_hidden_states`
$ python3 builder.py -m llama3.2-3b -o row_llama3.2-3b-onnx-int4 -p int4 -e cpu --extra_options int4_block_size=128 int4_accuracy_level=4 int4_op_types_to_quantize=MatMul/Gather include_hidden_states=1 Alternatively, you can wait for ONNX Runtime GenAI v0.6.0 to be released since it is scheduled to come out this month. |
Describe the bug
can't infer with a "exclude_lm_head" model
To Reproduce
Then you will see the bug:
onnxruntime-genai version: 0.5.2
OS:linux
The text was updated successfully, but these errors were encountered: