Request for Documentation on LUT Quantization Theory and Generation Methods for LUT_Biases and LUT_Scales #67

zhouexellent · 2024-10-29T01:57:40Z

Hello，thank you for your outstanding work!

We couldn’t locate any theoretical insights or perspectives related to LUT quantization in the referenced paper, and while reading through the source code, we also couldn’t find specific details on the generation method for the LUT_Biases matrix. Could you please provide relevant literature references or suggest keywords for further research?

Additionally, I believe there may be some potential issues in the method used to generate the LUT_Scales matrix. Specifically:

(1) Taking the absolute value first and then summing.
(2) Summing first and then taking the absolute value.
It seems that approach (1) might be slightly more appropriate than approach (2).

source code (specifically located at ./python/t_mac/ops/qgemm.py)
`
LUT_Scales = te.compute(
(N, K // self.act_group_size),
lambda n, kk: te.max(
te.abs(sum(B[n, kk * self.act_group_size + sk * self.g + g] for g in range(self.g))) / self.maxv,
axis=sk,
),
name="LUT_Scales",
)

    LUT_Biases = te.placeholder((N, K // self.act_group_size), dtype=self.out_dtype, name="LUT_Biases")

if self.has_lut_scale:
LUT_Scales = te.placeholder((N, K // self.act_group_size), dtype=self.out_dtype, name="LUT_Scales")
LUT_Biases = te.placeholder((N, K // self.act_group_size), dtype=self.out_dtype, name="LUT_Biases")
def _lut_scale(n, k, val):
return val * LUT_Scales[n, k * self.g // self.act_group_size] + LUT_Biases[n, k * self.g // self.act_group_size] * alphas[0]

Scales = te.placeholder(scales_shape, dtype=self.out_dtype, name="Scales")

if self.m_groups == -1:
if K % self.group_size != 0:
raise TVMError("K({}) must be devisible by group_size({})".format(K, self.group_size))
if self.zero_point:
scales_shape = (M // bm, K // self.group_size, bm // self.bits * 2)
def _get_scale(m, k):
# Fake _get_scale, should be tensorized
return Scales[m // bm, k * self.g // self.group_size, (m % bm) // self.bits * 2] - Scales[m // bm, k * self.g // self.group_size, (m % bm) // self.bits * 2 + 1]
`

Thank you for your assistance!

The text was updated successfully, but these errors were encountered:

kaleid-liner · 2024-10-29T05:31:03Z

We have compared the accuracy of INT8 LUT quantization with llama.cpp Q8_0 group-wise activation quantization from both kernel-wise and model-wise in Sec 5.6 of our paper.
The LUT quantization in Python is just placeholder computation. Please refer to lut_ctor.cc.

kaleid-liner added the question Further information is requested label Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for Documentation on LUT Quantization Theory and Generation Methods for LUT_Biases and LUT_Scales #67

Request for Documentation on LUT Quantization Theory and Generation Methods for LUT_Biases and LUT_Scales #67

zhouexellent commented Oct 29, 2024

kaleid-liner commented Oct 29, 2024

Request for Documentation on LUT Quantization Theory and Generation Methods for LUT_Biases and LUT_Scales #67

Request for Documentation on LUT Quantization Theory and Generation Methods for LUT_Biases and LUT_Scales #67

Comments

zhouexellent commented Oct 29, 2024

kaleid-liner commented Oct 29, 2024