Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

T-MAC reports an error when using the qwen2.5 3-bit model #76

Open
xdd130 opened this issue Dec 4, 2024 · 4 comments
Open

T-MAC reports an error when using the qwen2.5 3-bit model #76

xdd130 opened this issue Dec 4, 2024 · 4 comments

Comments

@xdd130
Copy link

xdd130 commented Dec 4, 2024

I try used the AutoGPTQ tool to quantize the Qwen2.5-3B-Instruct model to 3 bits. I successfully obtained the model in GPTQ format, but when I compiled the script using T-MAC:

python compile.py -fa -o tuned -da -nt 8 -tb -gc -gs 128 -ags 64 -m gptq-auto -md /home/phytium/yq/Qwen2.5-3B-yq/Qwen2.5-3B-Quant-3b -t

I got the following error:

Traceback (most recent call last):
  File "/home/tmac/tmac/T-MAC/deploy/compile.py", line 244, in <module>
    main()
  File "/home/tmac/tmac/T-MAC/deploy/compile.py", line 234, in main
    compile(**device_kwargs)
  File "/home/tmac/tmac/T-MAC/deploy/compile.py", line 130, in compile
    qgemm_mod = qgemm_lut.compile(
                ^^^^^^^^^^^^^^^^^^
  File "/home/tmac/tmac/T-MAC/python/t_mac/ops/base.py", line 255, in compile
    self.tuning(*args, n_trial=n_trial, thread_affinity=thread_affinity, **eval_kwargs)
  File "/home/tmac/tmac/T-MAC/python/t_mac/ops/base.py", line 95, in tuning
    task = autotvm.task.create(template_name, args=args, target=self.target)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tmac/T-MAC/3rdparty/tvm/python/tvm/autotvm/task/task.py", line 480, in create
    sch, _ = ret.func(*args)
             ^^^^^^^^^^^^^^^
  File "/home/tmac/tmac/T-MAC/3rdparty/tvm/python/tvm/autotvm/task/task.py", line 240, in __call__
    return self.fcustomized(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tmac/tmac/T-MAC/python/t_mac/ops/base.py", line 71, in _func
    tensors = self._compute(*args)
              ^^^^^^^^^^^^^^^^^^^^
  File "/home/tmac/tmac/T-MAC/python/t_mac/ops/qgemm.py", line 138, in _compute
    raise TVMError("K({}) must be devisible by group_size({})".format(K, self.group_size))
tvm._ffi.base.TVMError: K(10304) must be devisible by group_size(128)

Does this indicate that there is a problem with my quantization step or that t-mac does not support the direct use of 3-bit models? What additional operations do I need to do?

@QingtaoLi1
Copy link
Contributor

tvm._ffi.base.TVMError: K(10304) must be devisible by group_size(128)

@xdd130 Does your model have any weight of shape 10304? Since 10304=64*161, you should set -gs=64 in the command.

@xdd130
Copy link
Author

xdd130 commented Dec 9, 2024

@QingtaoLi1
Thanks for your reply.
When I set -gs=64, encountered a new shape problem:
tvm._ffi.base.TVMError: K(10320) must be devisible by act_group_size(64)
So, I set -ags=16.

python compile.py -fa -o tuned -da -nt 8 -tb -gc -gs 64 -ags 16 -m gptq-auto -md /home/phytium/yq/Qwen2.5-3B-yq/Qwen2.5-3B-Quant-3b -t

error is as follows:
tvm._ffi.base.TVMError: K(10320) must be devisible by group_size(128)
No matter what value I set gs to, I get the same error as above

@QingtaoLi1
Copy link
Contributor

@xdd130 -ags is activation group size (but should not be smaller than -gs), so you should set -gs for the weight shape as well. And currently, they should not be smaller than 32, so 10320 may not be supported now.

@xdd130
Copy link
Author

xdd130 commented Dec 12, 2024

That's right, thanks for your reply!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants