Skip to content

Commit 99f10df

Browse files
committed
Merge branch 'k_quant' of https://github.com/jiafatom/neural-compressor into k_quant
2 parents de4f7f0 + d91b4e5 commit 99f10df

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

neural_compressor/adaptor/ox_utils/weight_only.py

+2
Original file line numberDiff line numberDiff line change
@@ -249,6 +249,7 @@ def quant_tensor(data, num_bits=4, group_size=32, scheme="asym", dtype="int", ra
249249

250250
def quant_tensor_k_quant_cpu(data, num_bits=4, group_size=32):
251251
"""Quantize tensor per group based on k quant.
252+
252253
Ref: https://github.com/ggml-org/llama.cpp/blob/64eda5deb9859e87a020e56bab5d2f9ca956f1de/ggml/src/ggml-quants.c
253254
254255
Args:
@@ -321,6 +322,7 @@ def quant_tensor_k_quant_cpu(data, num_bits=4, group_size=32):
321322

322323
def quant_tensor_k_quant_cuda(data, num_bits=4, group_size=32):
323324
"""Quantize tensor per group based on k quant.
325+
324326
Ref: https://github.com/ggml-org/llama.cpp/blob/64eda5deb9859e87a020e56bab5d2f9ca956f1de/ggml/src/ggml-quants.c
325327
326328
Args:

0 commit comments

Comments
 (0)