You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Just like the model weight contains I2, I3, I4 quantization type
I checked the documentation and script, and it seems that it is not supported yet?
thanks!
The text was updated successfully, but these errors were encountered:
Both the inference kernel and the convert script already supported mixed precision quantization by detecting bits of each layer. However, I don't know if there are any tools to generate a mixed precision GPTQ model. If there is such a model, T-MAC can support it.
Both the inference kernel and the convert script already supported mixed precision quantization by detecting bits of each layer. However, I don't know if there are any tools to generate a mixed precision GPTQ model. If there is such a model, T-MAC can support it.
i have see t-mac tune kernel on shape 、bits and so on; compiled llama.cpp kernel only support one bit and net; how to support mixed network?(forexample: tuned and compilered llama.cpp can run 2bit bitnet, but they run other bits or network(4bits bitenet, 2bits llama2) will meet error)
Just like the model weight contains I2, I3, I4 quantization type
I checked the documentation and script, and it seems that it is not supported yet?
thanks!
The text was updated successfully, but these errors were encountered: