toncao

Follow

toncao toncao

Follow

[email protected]

0 followers · 1 following

Popular repositories Loading

toncao toncao Public

Config files for my GitHub profile.
AutoAWQ AutoAWQ Public

Forked from casper-hansen/AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

Python
GPTQModel GPTQModel Public

Forked from ModelCloud/GPTQModel

Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.

Python
llm-compressor llm-compressor Public

Forked from vllm-project/llm-compressor

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python