git clone git@github.com:NVIDIA/Model-Optimizer.git
cd Model-Optimizer
pip install -e .[dev]
python examples/llm_ptq/hf_ptq.py --pyt_ckpt_path /models/glm-fp16 --qformat nvfp4_mlp_only --export_path /models/glm-nvfp4_mlp_only --kv_cache_qformat none --calib_size 64 --trust_remote_code --dataset cnn_dailymail
Also, the mtp.safetensors file does not seem to be carried over to the quantized checkpoint. When running modelopt script i see this message. it seems like mtp layer is not being loaded:
Some weights of the model checkpoint at
/home/prithudasgupta_google_com/models/glm-fp16 were not used when initializing Glm4MoeForCausalLM: ['model.layers.92.eh_proj.weight', 'model.layers.92.embed_tokens.weight', 'model.layers.92.enorm.weight', 'model.layers.92.hnorm.weight', 'model.layers.92.input_layernorm.weight', 'model.layers.92.mlp.experts.0.down_proj.weight', 'model.layers.92.mlp.experts.0.gate_proj.weight', 'model.layers.92.mlp.experts.0.up_proj.weight', 'model.layers.92.mlp.experts.1.down_proj.weight', 'model.layers.92.mlp.experts.1.gate_proj.weight', 'model.layers.92.mlp.experts.1.up_proj.weight', 'model.layers.92.mlp.experts.10.down_proj.weight', 'model.layers.92.mlp.experts.10.gate_proj.weight', 'model.layers.92.mlp.experts.10.up_proj.weight', 'model.layers.92.mlp.experts.100.down_proj.weight', 'model.layers.92.mlp.experts.100.gate_proj.weight',
# hf_quant_config.json
{
"producer": {
"name": "modelopt",
"version": "0.41.0.dev45+g381ac9dc5"
},
"quantization": {
"quant_algo": "NVFP4",
"kv_cache_quant_algo": null,
"group_size": 16,
"exclude_modules": [
"lm_head",
"model.layers.0.self_attn*",
"model.layers.1.self_attn*",
"model.layers.10.self_attn*",
"model.layers.11.self_attn*",
"model.layers.12.self_attn*",
"model.layers.13.self_attn*",
"model.layers.14.self_attn*",
"model.layers.15.self_attn*",
"model.layers.16.self_attn*",
"model.layers.17.self_attn*",
"model.layers.18.self_attn*",
"model.layers.19.self_attn*",
"model.layers.2.self_attn*",
"model.layers.20.self_attn*",
"model.layers.21.self_attn*",
"model.layers.22.self_attn*",
"model.layers.23.self_attn*",
"model.layers.24.self_attn*",
"model.layers.25.self_attn*",
"model.layers.26.self_attn*",
"model.layers.27.self_attn*",
"model.layers.28.self_attn*",
"model.layers.29.self_attn*",
"model.layers.3.self_attn*",
"model.layers.30.self_attn*",
"model.layers.31.self_attn*",
"model.layers.32.self_attn*",
"model.layers.33.self_attn*",
"model.layers.34.self_attn*",
"model.layers.35.self_attn*",
"model.layers.36.self_attn*",
"model.layers.37.self_attn*",
"model.layers.38.self_attn*",
"model.layers.39.self_attn*",
"model.layers.4.self_attn*",
"model.layers.40.self_attn*",
"model.layers.41.self_attn*",
"model.layers.42.self_attn*",
"model.layers.43.self_attn*",
"model.layers.44.self_attn*",
"model.layers.45.self_attn*",
"model.layers.46.self_attn*",
"model.layers.47.self_attn*",
"model.layers.48.self_attn*",
"model.layers.49.self_attn*",
"model.layers.5.self_attn*",
"model.layers.50.self_attn*",
"model.layers.51.self_attn*",
"model.layers.52.self_attn*",
"model.layers.53.self_attn*",
"model.layers.54.self_attn*",
"model.layers.55.self_attn*",
"model.layers.56.self_attn*",
"model.layers.57.self_attn*",
"model.layers.58.self_attn*",
"model.layers.59.self_attn*",
"model.layers.6.self_attn*",
"model.layers.60.self_attn*",
"model.layers.61.self_attn*",
"model.layers.62.self_attn*",
"model.layers.63.self_attn*",
"model.layers.64.self_attn*",
"model.layers.65.self_attn*",
"model.layers.66.self_attn*",
"model.layers.67.self_attn*",
"model.layers.68.self_attn*",
"model.layers.69.self_attn*",
"model.layers.7.self_attn*",
"model.layers.70.self_attn*",
"model.layers.71.self_attn*",
"model.layers.72.self_attn*",
"model.layers.73.self_attn*",
"model.layers.74.self_attn*",
"model.layers.75.self_attn*",
"model.layers.76.self_attn*",
"model.layers.77.self_attn*",
"model.layers.78.self_attn*",
"model.layers.79.self_attn*",
"model.layers.8.self_attn*",
"model.layers.80.self_attn*",
"model.layers.81.self_attn*",
"model.layers.82.self_attn*",
"model.layers.83.self_attn*",
"model.layers.84.self_attn*",
"model.layers.85.self_attn*",
"model.layers.86.self_attn*",
"model.layers.87.self_attn*",
"model.layers.88.self_attn*",
"model.layers.89.self_attn*",
"model.layers.9.self_attn*",
"model.layers.90.self_attn*",
"model.layers.91.self_attn*"
]
}
}
Before submitting an issue, please make sure it hasn't been already addressed by searching through the existing and past issues.
Describe the bug
I am trying to quantize GLM 4.7 to nvfp4 using ModelOpt. But 1) the MTP layer does not seem to be quantized and 2) there are issues serving this checkpoint using SGLang with MTP. This bug is specific to (1).
Steps/Code to reproduce bug
The hf_quant_config.json does not exclude the mtp layer
model.layers.92*. I also tried manually adding*mtp*": {"enable": False}to quantization configmtq.NVFP4_MLP_ONLY_CFG. But I still do not see the mtp layer excluded from quantization.Also, the mtp.safetensors file does not seem to be carried over to the quantized checkpoint. When running modelopt script i see this message. it seems like mtp layer is not being loaded:
Expected behavior
Who can help?
System information