[Question]Whether T-MAC supports mixed-precision LLM? #38

AndreaChiChengdu · 2024-09-03T11:03:30Z

Just like the model weight contains I2, I3, I4 quantization type
I checked the documentation and script, and it seems that it is not supported yet?
thanks!

kaleid-liner · 2024-09-04T06:45:26Z

Both the inference kernel and the convert script already supported mixed precision quantization by detecting bits of each layer. However, I don't know if there are any tools to generate a mixed precision GPTQ model. If there is such a model, T-MAC can support it.

qw1319 · 2024-09-26T05:49:24Z

Both the inference kernel and the convert script already supported mixed precision quantization by detecting bits of each layer. However, I don't know if there are any tools to generate a mixed precision GPTQ model. If there is such a model, T-MAC can support it.

i have see t-mac tune kernel on shape 、bits and so on; compiled llama.cpp kernel only support one bit and net; how to support mixed network?(forexample: tuned and compilered llama.cpp can run 2bit bitnet, but they run other bits or network(4bits bitenet, 2bits llama2) will meet error）

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]Whether T-MAC supports mixed-precision LLM? #38

[Question]Whether T-MAC supports mixed-precision LLM? #38

AndreaChiChengdu commented Sep 3, 2024 •

edited

Loading

kaleid-liner commented Sep 4, 2024

qw1319 commented Sep 26, 2024 •

edited

Loading

[Question]Whether T-MAC supports mixed-precision LLM? #38

[Question]Whether T-MAC supports mixed-precision LLM? #38

Comments

AndreaChiChengdu commented Sep 3, 2024 • edited Loading

kaleid-liner commented Sep 4, 2024

qw1319 commented Sep 26, 2024 • edited Loading

AndreaChiChengdu commented Sep 3, 2024 •

edited

Loading

qw1319 commented Sep 26, 2024 •

edited

Loading