Popular repositories Loading
-
-
AutoAWQ
AutoAWQ PublicForked from casper-hansen/AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
Python
-
GPTQModel
GPTQModel PublicForked from ModelCloud/GPTQModel
Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
Python
-
llm-compressor
llm-compressor PublicForked from vllm-project/llm-compressor
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Python
If the problem persists, check the GitHub status page or contact support.