Skip to content

GPTQModel v1.7.4

Latest
Compare
Choose a tag to compare
@Qubitium Qubitium released this 26 Jan 07:02
· 16 commits to main since this release
b623b96

What's Changed

⚡ Faster packing for post-quantization model weight save.
Triton kernel now validated for Intel/XPU when Intel Triton package is installed.
⚡ New compile() api that allows torch to improve tps by ~4-8%. May need to disable flash_attention for some kernels.
🐛 Fix HF Transformers bug of downcasting fast tokenizer class on save.
🐛 Fix inaccurate bpw calculations.
🐛 Fix ROCm compile with setup.py

New Contributors

Full Changelog: v1.7.3...v1.7.4