Skip to content

GPTQModel v1.6.0

Compare
Choose a tag to compare
@Qubitium Qubitium released this 06 Jan 08:00
· 132 commits to main since this release
c5c2677

What's Changed

⚡ 25% faster quantization. 35% reduction in vram usage vs v1.5. 👀
🎉 AMD ROCm (6.2+) support added and validated for 7900XT+ GPU.
💫 Auto-tokenizer loader via load() api. For most models you no longer need to manually init a tokenizer for both inference and quantization.

Full Changelog: v1.5.1...v1.6.0