Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]Whether T-MAC supports mixed-precision LLM? #38

Open
AndreaChiChengdu opened this issue Sep 3, 2024 · 2 comments
Open

[Question]Whether T-MAC supports mixed-precision LLM? #38

AndreaChiChengdu opened this issue Sep 3, 2024 · 2 comments

Comments

@AndreaChiChengdu
Copy link

AndreaChiChengdu commented Sep 3, 2024

Just like the model weight contains I2, I3, I4 quantization type
I checked the documentation and script, and it seems that it is not supported yet?
thanks!

@kaleid-liner
Copy link
Collaborator

Both the inference kernel and the convert script already supported mixed precision quantization by detecting bits of each layer. However, I don't know if there are any tools to generate a mixed precision GPTQ model. If there is such a model, T-MAC can support it.

@qw1319
Copy link

qw1319 commented Sep 26, 2024

Both the inference kernel and the convert script already supported mixed precision quantization by detecting bits of each layer. However, I don't know if there are any tools to generate a mixed precision GPTQ model. If there is such a model, T-MAC can support it.

i have see t-mac tune kernel on shape 、bits and so on; compiled llama.cpp kernel only support one bit and net; how to support mixed network?(forexample: tuned and compilered llama.cpp can run 2bit bitnet, but they run other bits or network(4bits bitenet, 2bits llama2) will meet error)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants