Open
Description
Dear @cccclai
I’m reviewing the code while following the guide(Export with Spinquant) you provided for converting the Llama3.2-3B-Instruct model with Qualcomm SpinQuant. When I execute the _export_llama
function in the export_llama_lib.py
file, the pt2e_quantize(quantizers)
function is called. Within this function, the pt2e_calibrate
function is executed before the convert_pt2e
function. Why is pt2e_calibrate
performed before convert_pt2e
here?
Generally, wouldn't it make more sense to perform calibration after quantization?
Thank you