Release GPTQModel v1.6.0 · ModelCloud/GPTQModel

What's Changed

⚡ 25% faster quantization. 35% reduction in vram usage vs v1.5. 👀
🎉 AMD ROCm (6.2+) support added and validated for 7900XT+ GPU.
💫 Auto-tokenizer loader via load() api. For most models you no longer need to manually init a tokenizer for both inference and quantization.

note about batch_size to speed up quant by @Qubitium in #992
Add ROCm support by @CSY-ModelCloud in #993
Add bits test by @ZX-ModelCloud in #995
note about rocm support by @Qubitium in #998
[FIX] wrong variable name by @ZX-ModelCloud in #997
update rocm version tag by @CSY-ModelCloud in #999
Auto-tokenizer will be called within load() by @LRL-ModelCloud in #996
update transformers by @Qubitium in #1001
[FIX] torch qlinear forward by @ZX-ModelCloud in #1002
cleanup marlin info by @Qubitium in #1004
Use custom forward hook by @LRL-ModelCloud in #1003
fix hooked linear init by @LRL-ModelCloud in #1011
add HookedConv1D by @LRL-ModelCloud in #1012
record fwd time by @LRL-ModelCloud in #1013
add PYTORCH_CUDA_ALLOC_CONF for global & do ruff by @CSY-ModelCloud in #1015
[FIX] quantize_config could not read from config.json by @ZX-ModelCloud in #1022
Fix quant time by @LRL-ModelCloud in #1025
fix forward hook by @LRL-ModelCloud in #1027
Fix hooked conv2d by @LRL-ModelCloud in #1030
clean cache by @CL-ModelCloud in #1032

Full Changelog: v1.5.1...v1.6.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPTQModel v1.6.0

What's Changed

Contributors