Open
Description
🚀 The feature, motivation and pitch
Hi, I use the following bash script to export spinquant Qwen2.5-0.5b model, but I encounter some issues, I would like to know if someone has the same problems or can assist me in solving the issue.
export QWEN_QUANTIZED_CHECKPOINT=${WORKSPACE}/Qwen2.5-0.5B-Instruct-SpinQuant/consolidated.00.pth
export QWEN_PARAMS=${WORKSPACE}/Qwen2.5-0.5B-Instruct-SpinQuant/params.json
python -m examples.models.llama.export_llama \
--model "qwen2_5" \
--checkpoint "${QWEN_QUANTIZED_CHECKPOINT:?}" \
--params "${QWEN_PARAMS:?}" \
--use_sdpa_with_kv_cache \
-X \
--xnnpack-extended-ops \
--preq_mode 8da4w_output_8da8w \
--preq_group_size 64 \
--max_seq_length 2048 \
--max_context_length 2048 \
--output_name "qwen2_5_0_5b_quant_from_source.pte" \
-kv \
-d fp32 \
--preq_embedding_quantize 8,0 \
--use_spin_quant native \
--metadata '{"get_bos_id":151643, "get_eos_ids":[151643]}'
[INFO 2025-03-18 11:44:05,137 builder.py:211] Exporting with:
[INFO 2025-03-18 11:44:05,138 builder.py:212] inputs: (tensor([[2, 3, 4]]), {'input_pos': tensor([0])})
[INFO 2025-03-18 11:44:05,138 builder.py:213] kwargs: None
[INFO 2025-03-18 11:44:05,138 builder.py:214] dynamic shapes: ({1: <class 'executorch.extension.llm.export.builder.token_dim'>}, {'input_pos': {0: 1}})
[INFO 2025-03-18 11:44:29,659 builder.py:235] Running canonical pass: RemoveRedundantTransposes
[INFO 2025-03-18 11:44:29,853 export_llama_lib.py:701] Lowering model using following partitioner(s):
[INFO 2025-03-18 11:44:29,853 export_llama_lib.py:703] --> XnnpackDynamicallyQuantizedPartitioner
[INFO 2025-03-18 11:44:29,853 export_llama_lib.py:703] --> XnnpackPartitioner
[INFO 2025-03-18 11:44:29,853 builder.py:321] Using pt2e [] to quantizing the model...
[INFO 2025-03-18 11:44:29,853 builder.py:372] No quantizer provided, passing...
Traceback (most recent call last):
File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/bobo.zhou/workspace/executorch/examples/models/llama/export_llama.py", line 34, in <module>
main() # pragma: no cover
File "/home/bobo.zhou/workspace/executorch/examples/models/llama/export_llama.py", line 30, in main
export_llama(args)
File "/home/bobo.zhou/workspace/executorch/examples/models/llama/export_llama_lib.py", line 543, in export_llama
builder = _export_llama(args)
File "/home/bobo.zhou/workspace/executorch/examples/models/llama/export_llama_lib.py", line 879, in _export_llama
builder = _to_edge_and_lower_llama_xnnpack(
File "/home/bobo.zhou/workspace/executorch/examples/models/llama/export_llama_lib.py", line 711, in _to_edge_and_lower_llama_xnnpack
builder = builder_exported.pt2e_quantize(quantizers).to_edge_transform_and_lower(
File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/extension/llm/export/builder.py", line 445, in to_edge_transform_and_lower
self.edge_manager = to_edge_transform_and_lower(
File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/exir/program/_program.py", line 106, in wrapper
return func(self, *args, **kwargs)
File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/exir/program/_program.py", line 1171, in to_edge_transform_and_lower
edge_manager = edge_manager.to_backend({name: curr_partitioner})
File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/exir/program/_program.py", line 106, in wrapper
return func(self, *args, **kwargs)
File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/exir/program/_program.py", line 1429, in to_backend
new_edge_programs[name] = to_backend(program, partitioner[name])
File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/functools.py", line 878, in wrapper
return dispatch(args[0].__class__)(*args, **kw)
File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/exir/backend/backend_api.py", line 397, in _
tagged_graph_module = _partition_and_lower(
File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/exir/backend/backend_api.py", line 320, in _partition_and_lower
partitioned_module = _partition_and_lower_one_graph_module(
File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/exir/backend/backend_api.py", line 250, in _partition_and_lower_one_graph_module
lowered_submodule = to_backend(
File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/functools.py", line 878, in wrapper
return dispatch(args[0].__class__)(*args, **kw)
File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/exir/backend/backend_api.py", line 114, in _
preprocess_result: PreprocessResult = cls.preprocess(
File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/backends/xnnpack/xnnpack_preprocess.py", line 171, in preprocess
node_visitors[node.target.__name__].define_node(
File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/backends/xnnpack/operators/op_linear.py", line 57, in define_node
self.define_tensor(
File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/backends/xnnpack/operators/node_visitor.py", line 398, in define_tensor
buffer_idx = self.get_serialized_buffer_index(
File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/backends/xnnpack/operators/node_visitor.py", line 574, in get_serialized_buffer_index
const_val = self.convert_to_qc4w(const_val)
File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/backends/xnnpack/operators/node_visitor.py", line 470, in convert_to_qc4w
assert (
AssertionError: convert_to_qc4w: [min,max] out of [-8, 7] range, got [-128, 127]
By the way, I refer QwenSpinQuant to get the Qwen2.5 0.5b spinquant model.
Alternatives
No response
Additional context
No response
RFC (Optional)
No response