Skip to content

Qwen2.5-0.5b SpinQuant export .pte fail #9353

Open
@Francis235

Description

@Francis235

🚀 The feature, motivation and pitch

Hi, I use the following bash script to export spinquant Qwen2.5-0.5b model, but I encounter some issues, I would like to know if someone has the same problems or can assist me in solving the issue.

export QWEN_QUANTIZED_CHECKPOINT=${WORKSPACE}/Qwen2.5-0.5B-Instruct-SpinQuant/consolidated.00.pth
export QWEN_PARAMS=${WORKSPACE}/Qwen2.5-0.5B-Instruct-SpinQuant/params.json

python -m examples.models.llama.export_llama \
   --model "qwen2_5" \
   --checkpoint "${QWEN_QUANTIZED_CHECKPOINT:?}" \
   --params "${QWEN_PARAMS:?}" \
   --use_sdpa_with_kv_cache \
   -X \
   --xnnpack-extended-ops \
   --preq_mode 8da4w_output_8da8w \
   --preq_group_size 64 \
   --max_seq_length 2048 \
   --max_context_length 2048 \
   --output_name "qwen2_5_0_5b_quant_from_source.pte" \
   -kv \
   -d fp32 \
   --preq_embedding_quantize 8,0 \
   --use_spin_quant native \
   --metadata '{"get_bos_id":151643, "get_eos_ids":[151643]}'
[INFO 2025-03-18 11:44:05,137 builder.py:211] Exporting with:
[INFO 2025-03-18 11:44:05,138 builder.py:212] inputs: (tensor([[2, 3, 4]]), {'input_pos': tensor([0])})
[INFO 2025-03-18 11:44:05,138 builder.py:213] kwargs: None
[INFO 2025-03-18 11:44:05,138 builder.py:214] dynamic shapes: ({1: <class 'executorch.extension.llm.export.builder.token_dim'>}, {'input_pos': {0: 1}})
[INFO 2025-03-18 11:44:29,659 builder.py:235] Running canonical pass: RemoveRedundantTransposes
[INFO 2025-03-18 11:44:29,853 export_llama_lib.py:701] Lowering model using following partitioner(s): 
[INFO 2025-03-18 11:44:29,853 export_llama_lib.py:703] --> XnnpackDynamicallyQuantizedPartitioner
[INFO 2025-03-18 11:44:29,853 export_llama_lib.py:703] --> XnnpackPartitioner
[INFO 2025-03-18 11:44:29,853 builder.py:321] Using pt2e [] to quantizing the model...
[INFO 2025-03-18 11:44:29,853 builder.py:372] No quantizer provided, passing...
Traceback (most recent call last):
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/bobo.zhou/workspace/executorch/examples/models/llama/export_llama.py", line 34, in <module>
    main()  # pragma: no cover
  File "/home/bobo.zhou/workspace/executorch/examples/models/llama/export_llama.py", line 30, in main
    export_llama(args)
  File "/home/bobo.zhou/workspace/executorch/examples/models/llama/export_llama_lib.py", line 543, in export_llama
    builder = _export_llama(args)
  File "/home/bobo.zhou/workspace/executorch/examples/models/llama/export_llama_lib.py", line 879, in _export_llama
    builder = _to_edge_and_lower_llama_xnnpack(
  File "/home/bobo.zhou/workspace/executorch/examples/models/llama/export_llama_lib.py", line 711, in _to_edge_and_lower_llama_xnnpack
    builder = builder_exported.pt2e_quantize(quantizers).to_edge_transform_and_lower(
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/extension/llm/export/builder.py", line 445, in to_edge_transform_and_lower
    self.edge_manager = to_edge_transform_and_lower(
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/exir/program/_program.py", line 106, in wrapper
    return func(self, *args, **kwargs)
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/exir/program/_program.py", line 1171, in to_edge_transform_and_lower
    edge_manager = edge_manager.to_backend({name: curr_partitioner})
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/exir/program/_program.py", line 106, in wrapper
    return func(self, *args, **kwargs)
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/exir/program/_program.py", line 1429, in to_backend
    new_edge_programs[name] = to_backend(program, partitioner[name])
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/functools.py", line 878, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/exir/backend/backend_api.py", line 397, in _
    tagged_graph_module = _partition_and_lower(
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/exir/backend/backend_api.py", line 320, in _partition_and_lower
    partitioned_module = _partition_and_lower_one_graph_module(
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/exir/backend/backend_api.py", line 250, in _partition_and_lower_one_graph_module
    lowered_submodule = to_backend(
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/functools.py", line 878, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/exir/backend/backend_api.py", line 114, in _
    preprocess_result: PreprocessResult = cls.preprocess(
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/backends/xnnpack/xnnpack_preprocess.py", line 171, in preprocess
    node_visitors[node.target.__name__].define_node(
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/backends/xnnpack/operators/op_linear.py", line 57, in define_node
    self.define_tensor(
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/backends/xnnpack/operators/node_visitor.py", line 398, in define_tensor
    buffer_idx = self.get_serialized_buffer_index(
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/backends/xnnpack/operators/node_visitor.py", line 574, in get_serialized_buffer_index
    const_val = self.convert_to_qc4w(const_val)
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/backends/xnnpack/operators/node_visitor.py", line 470, in convert_to_qc4w
    assert (
AssertionError: convert_to_qc4w: [min,max] out of [-8, 7] range, got [-128, 127]

By the way, I refer QwenSpinQuant to get the Qwen2.5 0.5b spinquant model.

Alternatives

No response

Additional context

No response

RFC (Optional)

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    need-user-inputThe issue needs more information from the reporter before moving forward

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions