Skip to content

Releases: vllm-project/llm-compressor

v0.6.0.1

28 Jul 19:05
0461bf9
Compare
Choose a tag to compare

What's Changed

Full Changelog: 0.6.0...0.6.0.1

v0.6.0

24 Jun 15:22
c052d2c
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: 0.5.2...0.6.0

v0.5.2

24 Jun 01:47
c1c8541
Compare
Choose a tag to compare

What's Changed

Read more

v0.5.1

29 Apr 01:34
ef175d7
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: 0.5.0...0.5.1

v0.5.0

03 Apr 13:23
25b1138
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: 0.4.1...0.5.0

v0.4.1

20 Feb 13:21
6a1ba3c
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: 0.4.0...0.4.1

v0.4.0

16 Jan 03:12
829af5b
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: 0.3.1...0.4.0

v0.3.1

12 Dec 13:25
c3608a0
Compare
Choose a tag to compare

What's Changed

Full Changelog: 0.3.0...0.3.1

v0.3.0

13 Nov 05:22
93832a6
Compare
Choose a tag to compare

What's New in v0.3.0

Key Features and Improvements

  • GPTQ Quantized-weight Sequential Updating (#177): Introduced an efficient sequential updating mechanism for GPTQ quantization, improving model compression performance and compatibility.
  • Auto-Infer Mappings for SmoothQuantModifier (#119): Automatically infers mappings based on model architecture, making SmoothQuant easier to apply across various models.
  • Improved Sparse Compression Usability (#191): Added support for targeted sparse compression with specific ignore rules during inference, allowing for more flexible model configurations.
  • Generic Wrapper for Any Hugging Face Model (#185): Added wrap_hf_model_class utility, enabling better support and integration for Hugging Face models i.e. not based on AutoModelForCausalLM.
  • Observer Restructure (#837): Introduced calibration and frozen steps within QuantizationModifier, moving Observers from compressed-tensors to llm-compressor.

Bug Fixes

  • Fix Tied Tensors Bug (#659)
  • Observer Initialization in GPTQ Wrapper (#883)
  • Sparsity Reload Testing (#882)

Documentation

  • Updated SmoothQuant Tutorial (#115): Expanded SmoothQuant documentation to include detailed mappings for easier implementation.

What's Changed

New Contributors

Full Changelog: 0.2.0...0.3.0

v0.2.0

23 Sep 22:24
2e0035f
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: 0.1.0...0.2.0