Description
经过几番捣腾,载入int4的模型OK了,浏览器提交prompt,后台报语法错误如下。
ubuntu:2204
NVIDIA-SMI 530.41.03
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
`~/MOSS$ python moss_gui_demo.py
Waiting for all devices to be ready, it may take a few minutes...
Running on local URL: http://0.0.0.0:6006
Running on public URL: https://7b29d06f6fba682b.gradio.live
This share link expires in 72 hours. For free permanent hosting and GPU upgrades (NEW!), check out Spaces: https://huggingface.co/spaces
Traceback (most recent call last):
File "/home/good/anaconda3/envs/moss/lib/python3.8/site-packages/gradio/routes.py", line 401, in run_predict
output = await app.get_blocks().process_api(
File "/home/good/anaconda3/envs/moss/lib/python3.8/site-packages/gradio/blocks.py", line 1302, in process_api
result = await self.call_function(
File "/home/good/anaconda3/envs/moss/lib/python3.8/site-packages/gradio/blocks.py", line 1025, in call_function
prediction = await anyio.to_thread.run_sync(
File "/home/good/anaconda3/envs/moss/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/good/anaconda3/envs/moss/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/home/good/anaconda3/envs/moss/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "moss_gui_demo.py", line 122, in predict
outputs = model.generate(
File "/home/good/anaconda3/envs/moss/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/home/good/anaconda3/envs/moss/lib/python3.8/site-packages/transformers/generation/utils.py", line 1571, in generate
return self.sample(
File "/home/good/anaconda3/envs/moss/lib/python3.8/site-packages/transformers/generation/utils.py", line 2534, in sample
outputs = self(
File "/home/good/anaconda3/envs/moss/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/good/MOSS/models/modeling_moss.py", line 678, in forward
transformer_outputs = self.transformer(
File "/home/good/anaconda3/envs/moss/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/good/MOSS/models/modeling_moss.py", line 545, in forward
outputs = block(
File "/home/good/anaconda3/envs/moss/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/good/MOSS/models/modeling_moss.py", line 270, in forward
attn_outputs = self.attn(
File "/home/good/anaconda3/envs/moss/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/good/MOSS/models/modeling_moss.py", line 164, in forward
qkv = self.qkv_proj(hidden_states)
File "/home/good/anaconda3/envs/moss/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/good/MOSS/models/quantization.py", line 371, in forward
out = QuantLinearFunction.apply(x.reshape(-1, x.shape[-1]), self.qweight, self.scales,
File "/home/good/anaconda3/envs/moss/lib/python3.8/site-packages/torch/cuda/amp/autocast_mode.py", line 94, in decorate_fwd
return fwd(*args, **kwargs)
File "/home/good/MOSS/models/quantization.py", line 283, in forward
output = matmul248(input, qweight, scales, qzeros, g_idx, bits, maxq)
File "/home/good/MOSS/models/quantization.py", line 254, in matmul248
matmul_248_kernel[grid](input, qweight, output,
File "/home/good/MOSS/models/custom_autotune.py", line 93, in run
self.cache[key] = builtins.min(timings, key=timings.get)
TypeError: '<' not supported between instances of 'tuple' and 'float'`