Skip to content

llama2 '8da4w-gptq' quantization fails #3632

Open
@Liming-Wang

Description

@Liming-Wang

I can export llama2 with -qmode=8da4w with NO problem, but when I tried the -qmode=8da4w-gptq, it fails.

  • installed packages
    executorch 0.3.0a0+aaa2f2e
    torch 2.4.0.dev20240507+cpu
    torchao 0.1
    torchtune 0.1.1

  • command to reproduce

$python -m examples.models.llama2.export_llama --checkpoint ~/.cache/huggingface/hub/models--meta-llama--Llama-2-7b/snapshots/69656aac4cb47911a639f5890ff35b41ceb82e98/consolidated.00.pth --params ~/.cache/huggingface/hub/models--meta-llama--Llama-2-7b/snapshots/69656aac4cb47911a639f5890ff35b41ceb82e98/params.json -kv --use_sdpa_with_kv_cache -X -qmode 8da4w-gptq --calibration_limit 128 --calibration_seq_length 2048 --group_size 128 -d fp32 --max_seq_length 4096
  • error log
[INFO 2024-05-16 13:12:11,098 builder.py:84] Loading model with checkpoint=~/.cache/huggingface/hub/models--meta-llama--Llama-2-7b/snapshots/69656aac4cb47911a639f5890ff35b41ceb82e98/consolidated.00.pth, params=~/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-hf/snapshots/01c7f73d771dfac7d292323805ebc428287df4f9/params.json, use_kv_cache=True, weight_type=WeightType.LLAMA
[INFO 2024-05-16 13:12:11,173 builder.py:105] Loaded model with dtype=torch.bfloat16
[INFO 2024-05-16 13:12:11,445 config.py:58] PyTorch version 2.4.0.dev20240507+cpu available.
[INFO 2024-05-16 13:12:13,489 huggingface.py:162] Using device 'cpu'
~/anaconda3/envs/executorch/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
[WARNING 2024-05-16 13:12:19,098 task.py:763] [Task: wikitext] metric word_perplexity is defined, but aggregation is not. using default aggregation=weighted_perplexity
[WARNING 2024-05-16 13:12:19,098 task.py:775] [Task: wikitext] metric word_perplexity is defined, but higher_is_better is not. using default higher_is_better=False
[WARNING 2024-05-16 13:12:19,098 task.py:763] [Task: wikitext] metric byte_perplexity is defined, but aggregation is not. using default aggregation=weighted_perplexity
[WARNING 2024-05-16 13:12:19,098 task.py:775] [Task: wikitext] metric byte_perplexity is defined, but higher_is_better is not. using default higher_is_better=False
[WARNING 2024-05-16 13:12:19,098 task.py:763] [Task: wikitext] metric bits_per_byte is defined, but aggregation is not. using default aggregation=bits_per_byte
[WARNING 2024-05-16 13:12:19,098 task.py:775] [Task: wikitext] metric bits_per_byte is defined, but higher_is_better is not. using default higher_is_better=False
Repo card metadata block was not found. Setting CardData to empty.
[WARNING 2024-05-16 13:12:22,935 repocard.py:107] Repo card metadata block was not found. Setting CardData to empty.
Obtaining GPTQ calibration inputs on:  ['wikitext']
[INFO 2024-05-16 13:12:23,026 task.py:395] Building contexts for wikitext on rank 0...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 62/62 [00:00<00:00, 544.89it/s]
[INFO 2024-05-16 13:12:23,144 evaluator.py:362] Running loglikelihood_rolling requests
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 62/62 [05:20<00:00,  5.16s/it]
Tracing model for GPTQ
E0516 13:17:43.372245 123454950823744 torch/_subclasses/fake_tensor.py:1628] [0/0] failed while attempting to run meta for aten.view.default
E0516 13:17:43.372245 123454950823744 torch/_subclasses/fake_tensor.py:1628] [0/0] Traceback (most recent call last):
E0516 13:17:43.372245 123454950823744 torch/_subclasses/fake_tensor.py:1628] [0/0]   File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_subclasses/fake_tensor.py", line 1624, in _dispatch_impl
E0516 13:17:43.372245 123454950823744 torch/_subclasses/fake_tensor.py:1628] [0/0]     r = func(*args, **kwargs)
E0516 13:17:43.372245 123454950823744 torch/_subclasses/fake_tensor.py:1628] [0/0]   File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_ops.py", line 630, in __call__
E0516 13:17:43.372245 123454950823744 torch/_subclasses/fake_tensor.py:1628] [0/0]     return self_._op(*args, **kwargs)
E0516 13:17:43.372245 123454950823744 torch/_subclasses/fake_tensor.py:1628] [0/0]   File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_refs/__init__.py", line 4550, in view
E0516 13:17:43.372245 123454950823744 torch/_subclasses/fake_tensor.py:1628] [0/0]     return _reshape_view_helper(a, *shape, allow_copy=False)
E0516 13:17:43.372245 123454950823744 torch/_subclasses/fake_tensor.py:1628] [0/0]   File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_refs/__init__.py", line 3623, in _reshape_view_helper
E0516 13:17:43.372245 123454950823744 torch/_subclasses/fake_tensor.py:1628] [0/0]     shape = utils.infer_size(shape, a.numel())
E0516 13:17:43.372245 123454950823744 torch/_subclasses/fake_tensor.py:1628] [0/0]   File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_prims_common/__init__.py", line 900, in infer_size
E0516 13:17:43.372245 123454950823744 torch/_subclasses/fake_tensor.py:1628] [0/0]     torch._check(
E0516 13:17:43.372245 123454950823744 torch/_subclasses/fake_tensor.py:1628] [0/0]   File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/__init__.py", line 1149, in _check
E0516 13:17:43.372245 123454950823744 torch/_subclasses/fake_tensor.py:1628] [0/0]     _check_with(RuntimeError, cond, message)
E0516 13:17:43.372245 123454950823744 torch/_subclasses/fake_tensor.py:1628] [0/0]   File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/__init__.py", line 1132, in _check_with
E0516 13:17:43.372245 123454950823744 torch/_subclasses/fake_tensor.py:1628] [0/0]     raise error_type(message_evaluated)
E0516 13:17:43.372245 123454950823744 torch/_subclasses/fake_tensor.py:1628] [0/0] RuntimeError: shape '[1, 2048, 32, 64]' is invalid for input of size 8192
Traceback (most recent call last):
  File "~/anaconda3/envs/executorch/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "~/anaconda3/envs/executorch/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/data0/limingw/workspace/llm/executorch/examples/models/llama2/export_llama.py", line 30, in <module>
    main()  # pragma: no cover
  File "/data0/limingw/workspace/llm/executorch/examples/models/llama2/export_llama.py", line 26, in main
    export_llama(modelname, args)
  File "/data0/limingw/workspace/llm/executorch/examples/models/llama2/export_llama_lib.py", line 303, in export_llama
    return _export_llama(modelname, args)
  File "/data0/limingw/workspace/llm/executorch/examples/models/llama2/export_llama_lib.py", line 381, in _export_llama
    builder_exported_to_edge = _prepare_for_llama_export(
  File "/data0/limingw/workspace/llm/executorch/examples/models/llama2/export_llama_lib.py", line 366, in _prepare_for_llama_export
    .source_transform(transforms)
  File "/data0/limingw/workspace/llm/executorch/examples/models/llama2/builder.py", line 203, in source_transform
    self.model = transform(self.model)
  File "/data0/limingw/workspace/llm/executorch/examples/models/llama2/source_transformation/quantize.py", line 129, in quantize
    model = gptq_quantizer.quantize(model, inputs)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torchao/quantization/GPTQ.py", line 1368, in quantize
    state_dict = self._create_quantized_state_dict(
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torchao/quantization/GPTQ.py", line 731, in _create_quantized_state_dict
    GPTQ_runner = GenericGPTQRunner(
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torchao/quantization/GPTQ.py", line 323, in __init__
    exported_model = torch._dynamo.export(
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1264, in inner
    result_traced = opt_f(*args, **kwargs)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 416, in _fn
    return fn(*args, **kwargs)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 981, in catch_errors
    return callback(frame, cache_entry, hooks, frame_state, skip=1)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 410, in _convert_frame_assert
    return _compile(
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_utils_internal.py", line 70, in wrapper_function
    return function(*args, **kwargs)
  File "~/anaconda3/envs/executorch/lib/python3.10/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 703, in _compile
    guarded_code = compile_inner(code, one_graph, hooks, transform)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 273, in time_wrapper
    r = func(*args, **kwargs)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 570, in compile_inner
    out_code = transform_code_object(code, transform)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/bytecode_transformation.py", line 1167, in transform_code_object
    transformations(instructions, code_options)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 172, in _fn
    return fn(*args, **kwargs)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 517, in transform
    tracer.run()
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2234, in run
    super().run()
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 884, in run
    while self.step():
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 799, in step
    self.dispatch_table[inst.opcode](self, inst)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 494, in wrapper
    return inner_fn(self, inst)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1253, in CALL_FUNCTION
    self.call_function(fn, args, {})
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 737, in call_function
    self.push(fn.call_function(self, args, kwargs))
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/variables/nn_module.py", line 353, in call_function
    return tx.inline_user_function_return(
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 743, in inline_user_function_return
    return InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2448, in inline_call
    return cls.inline_call_(parent, func, args, kwargs)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2564, in inline_call_
    tracer.run()
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 884, in run
    while self.step():
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 799, in step
    self.dispatch_table[inst.opcode](self, inst)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 494, in wrapper
    return inner_fn(self, inst)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1294, in CALL_FUNCTION_EX
    self.call_function(fn, argsvars.items, kwargsvars)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 737, in call_function
    self.push(fn.call_function(self, args, kwargs))
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 339, in call_function
    return super().call_function(tx, args, kwargs)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 293, in call_function
    return super().call_function(tx, args, kwargs)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 90, in call_function
    return tx.inline_user_function_return(self, [*self.self_args(), *args], kwargs)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 743, in inline_user_function_return
    return InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2448, in inline_call
    return cls.inline_call_(parent, func, args, kwargs)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2564, in inline_call_
    tracer.run()
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 884, in run
    while self.step():
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 799, in step
    self.dispatch_table[inst.opcode](self, inst)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 494, in wrapper
    return inner_fn(self, inst)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1253, in CALL_FUNCTION
    self.call_function(fn, args, {})
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 737, in call_function
    self.push(fn.call_function(self, args, kwargs))
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 339, in call_function
    return super().call_function(tx, args, kwargs)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 293, in call_function
    return super().call_function(tx, args, kwargs)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 90, in call_function
    return tx.inline_user_function_return(self, [*self.self_args(), *args], kwargs)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 743, in inline_user_function_return
    return InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2448, in inline_call
    return cls.inline_call_(parent, func, args, kwargs)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2564, in inline_call_
    tracer.run()
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 884, in run
    while self.step():
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 799, in step
    self.dispatch_table[inst.opcode](self, inst)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 494, in wrapper
    return inner_fn(self, inst)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1253, in CALL_FUNCTION
    self.call_function(fn, args, {})
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 737, in call_function
    self.push(fn.call_function(self, args, kwargs))
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 293, in call_function
    return super().call_function(tx, args, kwargs)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 90, in call_function
    return tx.inline_user_function_return(self, [*self.self_args(), *args], kwargs)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 743, in inline_user_function_return
    return InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2448, in inline_call
    return cls.inline_call_(parent, func, args, kwargs)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2564, in inline_call_
    tracer.run()
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 884, in run
    while self.step():
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 799, in step
    self.dispatch_table[inst.opcode](self, inst)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 494, in wrapper
    return inner_fn(self, inst)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1253, in CALL_FUNCTION
    self.call_function(fn, args, {})
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 737, in call_function
    self.push(fn.call_function(self, args, kwargs))
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 293, in call_function
    return super().call_function(tx, args, kwargs)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 90, in call_function
    return tx.inline_user_function_return(self, [*self.self_args(), *args], kwargs)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 743, in inline_user_function_return
    return InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2448, in inline_call
    return cls.inline_call_(parent, func, args, kwargs)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2564, in inline_call_
    tracer.run()
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 884, in run
    while self.step():
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 799, in step
    self.dispatch_table[inst.opcode](self, inst)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 494, in wrapper
    return inner_fn(self, inst)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1253, in CALL_FUNCTION
    self.call_function(fn, args, {})
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 737, in call_function
    self.push(fn.call_function(self, args, kwargs))
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/variables/misc.py", line 655, in call_function
    return self.obj.call_method(tx, self.name, args, kwargs)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/variables/tensor.py", line 457, in call_method
    return wrap_fx_proxy(
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 1519, in wrap_fx_proxy
    return wrap_fx_proxy_cls(target_cls=TensorVariable, **kwargs)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 1604, in wrap_fx_proxy_cls
    example_value = get_fake_value(proxy.node, tx, allow_non_graph_fake=True)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 1830, in get_fake_value
    raise TorchRuntimeError(str(e)).with_traceback(e.__traceback__) from None
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 1762, in get_fake_value
    ret_val = wrap_fake_exception(
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 1277, in wrap_fake_exception
    return fn()
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 1763, in <lambda>
    lambda: run_node(tx.output, node, args, kwargs, nnmodule)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 1898, in run_node
    raise RuntimeError(make_error_message(e)).with_traceback(
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 1882, in run_node
    return getattr(args[0], node.target)(*args[1:], **kwargs)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/utils/_stats.py", line 20, in wrapper
    return fn(*args, **kwargs)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_subclasses/fake_tensor.py", line 956, in __torch_dispatch__
    return self.dispatch(func, types, args, kwargs)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_subclasses/fake_tensor.py", line 1318, in dispatch
    return self._cached_dispatch_impl(func, types, args, kwargs)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_subclasses/fake_tensor.py", line 1039, in _cached_dispatch_impl
    output = self._dispatch_impl(func, types, args, kwargs)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_subclasses/fake_tensor.py", line 1624, in _dispatch_impl
    r = func(*args, **kwargs)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_ops.py", line 630, in __call__
    return self_._op(*args, **kwargs)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_refs/__init__.py", line 4550, in view
    return _reshape_view_helper(a, *shape, allow_copy=False)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_refs/__init__.py", line 3623, in _reshape_view_helper
    shape = utils.infer_size(shape, a.numel())
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/_prims_common/__init__.py", line 900, in infer_size
    torch._check(
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/__init__.py", line 1149, in _check
    _check_with(RuntimeError, cond, message)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/__init__.py", line 1132, in _check_with
    raise error_type(message_evaluated)
torch._dynamo.exc.TorchRuntimeError: Failed running call_method view(*(FakeTensor(..., size=(1, 128, 64), dtype=torch.bfloat16), [1, 2048, 32, 64]), **{}):
shape '[1, 2048, 32, 64]' is invalid for input of size 8192

from user code:
   File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/executorch/examples/models/llama2/llama_transformer.py", line 487, in forward
    h = layer(
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/executorch/examples/models/llama2/llama_transformer.py", line 424, in forward
    h = self.attention.forward(
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/executorch/examples/models/llama2/llama_transformer.py", line 321, in forward
    q, k = apply_rotary_emb(q, k, freqs_cos, freqs_sin)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/executorch/examples/models/llama2/llama_transformer.py", line 169, in apply_rotary_emb
    freqs_cos = reshape_for_broadcast(freqs_cos, xq_r)
  File "~/anaconda3/envs/executorch/lib/python3.10/site-packages/executorch/examples/models/llama2/llama_transformer.py", line 160, in reshape_for_broadcast
    return freqs_cis.view(shape)

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information

Has anyone succeeded on this? Please shed some lights and really appreciate the help.

cc @kimishpatel @jerryzh168

Metadata

Metadata

Assignees

Labels

module: quantizationIssues related to quantizationtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions