Thanks for the excellent work, what happens when let the CPU and GPU (or other) do the inference operations at the same time. #58

aoom · 2024-08-19T21:57:51Z

aoom
Aug 19, 2024

If the enhancement kernel supports both cpu and gpu accelerated reasoning. In addition, it could support distributed computing. This would instantly become one of the faster and more useful inference engine algorithms!

kaleid-liner · 2024-08-20T08:13:09Z

kaleid-liner
Aug 20, 2024
Collaborator

Thanks for your suggestions. From our insights, GPUs are not well-suited for LUT due to their limited on-chip memory per core. Placing a LUT on shared memory can lead to slow random access due to bank conflict. However, it's still a viable solution to use CPU/GPU/NPU in concert, while GPU/NPU using dequant-based method and CPU using T-MAC. We are exploring the possibility.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thanks for the excellent work, what happens when let the CPU and GPU (or other) do the inference operations at the same time. #58

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Thanks for the excellent work, what happens when let the CPU and GPU (or other) do the inference operations at the same time. #58

aoom Aug 19, 2024

Replies: 1 comment

kaleid-liner Aug 20, 2024 Collaborator

aoom
Aug 19, 2024

kaleid-liner
Aug 20, 2024
Collaborator