From 55dc91d7c8aed4be3ed8c89213f59f45485a9a10 Mon Sep 17 00:00:00 2001 From: Qubitium-ModelCloud Date: Fri, 17 Jan 2025 09:27:46 +0800 Subject: [PATCH] 1.7.0 release (#1085) * prepare for v1.7.0 release * Update version.py * Update README.md --- README.md | 1 + gptqmodel/version.py | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 1c9224f32..2ecc4af2d 100644 --- a/README.md +++ b/README.md @@ -9,6 +9,7 @@

## News +* 01/17/2025 [1.7.0](https://github.com/ModelCloud/GPTQModel/releases/tag/v1.7.0): 🎉🎉 `backend.MLX` added for runtime-conversion and execution of GPTQ models on Apple's `MLX` framework on Applie Silicon. Exports of `gptq` models to `mlx` also now possible. We have added `mlx` exported models to [huggingface.co/ModelCloud](https://huggingface.co/collections/ModelCloud/vortex-673743382af0a52b2a8b9fe2). `lm_head` quantization now fully support by GPTQModel without external pkg dependency. * 01/07/2025 [1.6.1](https://github.com/ModelCloud/GPTQModel/releases/tag/v1.6.1): 🎉 New OpenAI api compatible end-point via `model.serve(host, port)`. Auto-enable flash-attention2 for inference. Fixed `sym=False` loading regression. * 01/06/2025 [1.6.0](https://github.com/ModelCloud/GPTQModel/releases/tag/v1.6.0): ⚡25% faster quantization. 35% reduction in vram usage vs v1.5. 👀 AMD ROCm (6.2+) support added and validated for 7900XT+ GPU. Auto-tokenizer loader via `load()` api. For most models you no longer need to manually init a tokenizer for both inference and quantization. * 01/01/2025 [1.5.1](https://github.com/ModelCloud/GPTQModel/releases/tag/v1.5.1): 🎉 2025! Added `QuantizeConfig.device` to clearly define which device is used for quantization: default = `auto`. Non-quantized models are always loaded on cpu by-default and each layer is moved to `QuantizeConfig.device` during quantization to minimize vram usage. Compatibility fixes for `attn_implementation_autoset` in latest transformers. diff --git a/gptqmodel/version.py b/gptqmodel/version.py index c2e8226cb..3eb3d3b4d 100644 --- a/gptqmodel/version.py +++ b/gptqmodel/version.py @@ -13,4 +13,4 @@ # See the License for the specific language governing permissions and # limitations under the License. -__version__ = "1.7.0-dev" +__version__ = "1.7.0"