Releases · vllm-project/llm-compressor

[Experimental] Mistral-format FP8 quantization by @mgoin in #1359
[Examples] [Bugfix] skip sparsity stats when saving checkpoints by @kylesayrs in #1528
[Examples] [Bugfix] Fix debug message by @kylesayrs in #1529
[Tests][NVFP4] No longer skip NVFP4A16 e2e test by @dsikka in #1538
[AWQ] Support for Calibration Datasets of varying feature dimension by @brian-dellabetta in #1536
fix qwen 2.5 VL multimodal example by @brian-dellabetta in #1541
[Example] [Bugfix] Fix Gemma ignore list by @kylesayrs in #1531
[Tests][NVFP4] Add e2e nvfp4 test by @dsikka in #1543
[Examples] Use more robust splits by @kylesayrs in #1544
[Bugfix] [Autowrapper] Fix visit_Delete by @kylesayrs in #1532
[Example] Fix Qwen VL ignore list by @arunmadhusud in #1545
[Tests] Fix Qwen2.5-VL-7B-Instruct Recipe by @dsikka in #1548
[Bugfix] Fix gemma2 generation by @kylesayrs in #1552
fix skipif check on tests involving gated HF models by @brian-dellabetta in #1553
[NVFP4] Fix global scale update when dealing with offloaded layers by @dsikka in #1554
oneshot entrypoint update by @ved1beta in #1445
LM Eval tests -- ignore vision tower for VL fp8 test by @brian-dellabetta in #1562
[Performance] Sequential onloading by @kylesayrs in #1263
[BugFix] Explicitly set gpu_memory_utilization by @rahul-tuli in #1560
Add Axolotl blog link by @rahul-tuli in #1563
[Bugfix] Fix multigpu dispatch_for_generation by @kylesayrs in #1567
[Testing] Set VLLM_WORKER_MULTIPROC_METHOD for e2e testing by @dsikka in #1569
[BugFix] Fix quantizaiton_2of4_sparse_w4a16 example by @shanjiaz in #1565
[Pipelines] infer model device with optional override by @kylesayrs in #1572
bump up requirement for compressed-tensors to 0.10.2 by @dhuangnm in #1581

New Contributors

@arunmadhusud made their first contribution in #1545

Full Changelog: 0.5.2...0.6.0

Contributors

mgoin, brian-dellabetta, and 7 other contributors

Assets 4

24 Jun 01:47

dhuangnm

0.5.2

c1c8541

v0.5.2

What's Changed

Exclude images from package by @kylesayrs in #1397
[Tracing] Skip non-ancestors of sequential targets by @kylesayrs in #1389
Consolidate build config by @dbarbuzzi in #1398
[Tests] Disable silently failing kv cache test by @kylesayrs in #1371
Drop flash_attn skip for quantizing_moe example tests by @dbarbuzzi in #1396
[VLM] Fix mllama targets by @kylesayrs in #1402
[Tests] Use requires_gpu, fix missing gpu test skip, add explicit test for gpu from gha by @kylesayrs in #1264
Implement QuantizationMixin by @kylesayrs in #1351
Add new-features section by @rahul-tuli in #1408
[Tracing] Support tracing of Gemma3 [#1248] by @kelkelcheng in #1373
bugfix kv cache quantization with ignored layers by @brian-dellabetta in #1312
AWQ sanitize_kwargs minor cleanup by @brian-dellabetta in #1405
[Tracing][Testing] Add tracing tests by @kylesayrs in #1335
fix lm eval test reproducibility issues by @brian-dellabetta in #1260
Pipeline Extraction by @kylesayrs in #1279
Add pull_request trigger to base tests workflow by @dbarbuzzi in #1417
removing RecipeMetadata and references by @shanjiaz in #1414
Update examples to only load required number of samples from dataset by @kylesayrs in #1118
[Tracing] Reinstate ignore functionality by @kylesayrs in #1423
[Typo] overriden by @kylesayrs in #1420
Rename SparsityModifierMixin to SparsityModifierBase by @kylesayrs in #1416
Remove RecipeArgs class & its references by @shanjiaz in #1429
[Examples] Standardize AWQ example by @kylesayrs in #1412
[Logging] Support logging once by @kylesayrs in #1431
Add: deepseekv2 smoothquant mappings by @rahul-tuli in #1433
AWQ QuantizationMixin + SequentialPipeline by @brian-dellabetta in #1426
patch awq tests/readme after QuantizationMixin refactor by @brian-dellabetta in #1439
Added more tests for Quantization24SparseW4A16 by @shanjiaz in #1434
[GPTQ] Add actorder option to modifier by @kylesayrs in #1424
[Bugfix][Tracing] Fix qwen2_5_vl by @kylesayrs in #1448
[Tests] Use proper offloading utils in test_compress_tensor_utils by @kylesayrs in #1449
[Tracing] Fix Traceable Imports by @kylesayrs in #1452
[NVFP4] Enable FP4 Weight-Only Quantization by @dsikka in #1309
Pin transformers to <4.52.0 by @brian-dellabetta in #1459
AWQ Apply Scales Bugfix when smooth layer output length doesn't match balance layer input length by @brian-dellabetta in #1451
Fix #1344 Extend e2e tests to add asym support for W8A8-Int8 by @ved1beta in #1345
[Tests] Fix activation recipe for w8a8 asym by @dsikka in #1461
AWQ Qwen and Phi mappings by @brian-dellabetta in #1440
[Observer] Optimize mse observer by @shanjiaz in #1450
Fix: Improve SmoothQuant Support for Mixture of Experts (MoE) Models by @rahul-tuli in #1455
[Tests] Add nvfp4a16 e2e test case by @dsikka in #1463
[Docs] Update README to list fp4 by @dsikka in #1462
Remove duplicate model id var from awq example recipe by @AndrewMead10 in #1467
Added observer type for test_min_max by @shanjiaz in #1466
Disable kernels during calibration (and tracing) by @kylesayrs in #1454
[GPTQ] Fix actorder resolution, add sentinel by @kylesayrs in #1453
Set show_progress to True by @dsikka in #1471
Remove compress by @dsikka in #1470
raise error if block quantization is used, as it is not yet supported by @brian-dellabetta in #1476
[Tests] Increase max seq length for tracing tests by @kylesayrs in #1478
[Tests] Fix dynamic field to be a bool, not string by @dsikka in #1480
[Examples] Fix qwen vision examples by @kylesayrs in #1481
[NVFP4] Update to use tensor_group strategy; update observers by @dsikka in #1484
loosen lmeval assertions to upper or lower bound by @brian-dellabetta in #1477
Revert "expand observers to calculate gparams, add example for activa… by @dsikka in #1486
fix rest of the minmax tests by @shanjiaz in #1469
Add warning for non-divisible group quantization by @kylesayrs in #1401
[AWQ] Support accumulation for reduced memory usage by @kylesayrs in #1435
[Tracing] Code AutoWrapper by @kylesayrs in #1411
Removed RecipeTuple & RecipeContainer class by @shanjiaz in #1460
Unpin to support transformers==4.52.3 by @kylesayrs in #1479
[Tests] GPTQ Actorder Resolution Tests by @kylesayrs in #1468
[Testing] Skip FP4 Test by @dsikka in #1499
[Bugfix] Remove tracing imports from tests by @kylesayrs in #1498
[Testing] Use a slightly larger model that works with group_size 128 by @dsikka in #1502
skip tracing tests if token unavailable by @brian-dellabetta in #1493
Fix missing logs when calling oneshot by @kelkelcheng in #1446
[NVFP4] Expand observers to calculate gparam, support NVFP4 Activations by @dsikka in #1487
[Tests] Remove duplicate test by @kylesayrs in #1500
[Model] Mistral3 example and test by @kylesayrs in #1490
[NVFP4] Use observers to generate global weight scales by @dsikka in #1504
Revert "[NVFP4] Use observers to generate global weight scales " by @dsikka in #1507
[NVFP4] Update global scale generation by @dsikka in #1508
[NVFP4] Fix onloading of fused layers by @dsikka in #1512
Pin pandas to <2.3 by @dbarbuzzi in #1515
AWQModifier fast resolve mappings, better logging, MoE support by @brian-dellabetta in #1444
Update setup.py by @dsikka in #1516
Use model compression pathways by @kylesayrs in #1419
[Example] [Bugfix] Fix Gemma3 Generation by @kylesayrs in #1517
[Docs] Update ReadME details for FP4 by @dsikka in #1519
[Examples] [Bugfix] Perform sample generation before saving as compressed by @kylesayrs in #1530
Add citation information both in README as well as native GitHub file support by @markurtz in #1527
update compress...

Contributors

dbarbuzzi, brian-dellabetta, and 9 other contributors

Assets 4

29 Apr 01:34

dbarbuzzi

0.5.1

ef175d7

v0.5.1

What's Changed

Update nm-actions/changed-files to v1.16.0 by @dbarbuzzi in #1311
docs: fix missing git clone command and repo name typos in DEVELOPING.md by @gattshjott in #1325
Update e2e/lm-eval test infrastructure by @dbarbuzzi in #1323
fix(logger): normalize log_file_level input for consistency by @gattshjott in #1324
[Utils] Replace preserve_attr with patch_attr by @kylesayrs in #1187
Fix cut off log in entrypoints/utils.py post_process() by @mgoin in #1336
[Tests] Update condition for sparsity check to be more robust by @dsikka in #1337
[Utils] Add skip_weights_download for developers and testing by @kylesayrs in #1334
replace custom version handling with setuptools-scm by @dhellmann in #1322
[Compression] Update sparsity calculation lifecycle when fetching the compressor by @dsikka in #1332
[Sequential] Support models with nested _no_split_modules by @kylesayrs in #1329
[Tracing] Remove TraceableWhisperForConditionalGeneration by @kylesayrs in #1310
Add torch device to list of offloadable types by @kylesayrs in #1348
Reduce SmoothQuant Repr by @kylesayrs in #1289
Use align_module_device util by @kylesayrs in #1298
Fix project URL in setup.py by @tiran in #1353
Update trigger on PR comment workflow by @dbarbuzzi in #1357
Add timing functionality to lm-eval tests by @ved1beta in #1346
[Callbacks][Docs] Add docstrings to saving functions by @kylesayrs in #1201
Move: recipe parsing test from e2e/ to main test suite by @rahul-tuli in #1360
Smoothquant typehinting by @kylesayrs in #1285
AWQ Modifier by @brian-dellabetta in #1177
[Tests] Update transformers tests to run kv_cache tests by @dsikka in #1364
[Transformers] Support latest transformers by @dsikka in #1352
Update test_consecutive_runs.py by @dsikka in #1366
[Docs] Mention AWQ, some clean-up by @dsikka in #1367
Fix versioning for source installs by @dbarbuzzi in #1370
[Testing] Reduce error verbosity of cleanup by @kylesayrs in #1365
Update test_oneshot_and_finetune.py to use pytest.approx by @markurtz in #1339
[Tracing] Better runtime error messages by @kylesayrs in #1307
[Tests] Fix test case; update structure by @dsikka in #1375
fix: Make Recipe.model_dump() output compatible with model_validate() by @ved1beta in #1328
Add: documentation for enhanced save_pretrained parameters by @rahul-tuli in #1377
Revert "fix: Make Recipe.model_dump() output compatible .... by @rahul-tuli in #1378
AWQ resolved mappings -- ensure shapes align by @brian-dellabetta in #1372
Update w4a16_actorder_weight.yaml lmeval config by @dbarbuzzi in #1380
[WIP] Add AWQ Asym e2e test case by @dsikka in #1374
Bump version; set ct version by @dsikka in #1381
bugfix AWQ with Llama models and python 3.9 by @brian-dellabetta in #1384
awq -- hotfix to missing kwargs by @brian-dellabetta in #1395

New Contributors

@gattshjott made their first contribution in #1325
@dhellmann made their first contribution in #1322
@tiran made their first contribution in #1353
@ved1beta made their first contribution in #1346

Full Changelog: 0.5.0...0.5.1

Contributors

dhellmann, tiran, and 9 other contributors

Assets 4

03 Apr 13:23

dhuangnm

0.5.0

25b1138

v0.5.0

What's Changed

re-add vllm e2e test now that bug is fixed by @brian-dellabetta in #1162
Fix Readme Imports by @kylesayrs in #1165
Remove event_called by @kylesayrs in #1155
Update: Test name by @rahul-tuli in #1172
Remove lifecycle initialized_structure attribute by @kylesayrs in #1156
[VLM] Qwen 2.5 VL by @kylesayrs in #1113
Revert bump by @dsikka in #1178
Remove CLI by @dsikka in #1144
Add group act order case to lm_eval test by @dsikka in #1080
Update e2e test timings ouputs by @dsikka in #1179
[Oneshot Refactor] Main refactor by @horheynm in #1110
[StageRunner Removal] Remove Evalulate / validate pathway by @horheynm in #1145
[StageRemoval] Remove Predict pathway by @horheynm in #1146
Fix 2of4 Apply Example by @dsikka in #1181
Fix Sparse2of4 Example by @dsikka in #1182
Add qwen moe w4a16 example by @mgoin in #1186
[Callbacks] Consolidate Saving Methods by @kylesayrs in #1168
lmeval tests multimodal by @brian-dellabetta in #1150
[Dataset Performance] Add num workers on dataset processing - labels, tokenization by @horheynm in #1189
Fix a minor typo by @eldarkurtic in #1191
[Callbacks] Remove pre_initialize_structure by @kylesayrs in #1160
Make transformers-tests job conditional on files changed by @dbarbuzzi in #1197
Update finetune tests to decrease execution time by @dsikka in #1208
Update transformers tests to speed-up execution by @dsikka in #1211
Fix logging bug in oneshot.py by @aman2304 in #1213
[Training] Decouple Argument parser by @horheynm in #1207
Remove MonkeyPatch for GPUs by @dsikka in #1227
[Cosmetic] Rename data_args to dataset_args by @horheynm in #1206
[Training] Datasets - update Module by @horheynm in #1209
[BugFix] Fix logging disabling bug and add tests by @aman2304 in #1218
[Training] Unifying Preprocess + Postprocessing logic for Train/Oneshot by @horheynm in #1212
[Docs] Add info on when to use which PTQ/Sparsification by @horheynm in #1157
[Callbacks] Remove MagnitudePruningModifier.leave_enabled by @kylesayrs in #1198
Replace Xenova model stub with nm-testing model stub by @kylesayrs in #1239
Offload Cache Support torch.dtype by @kylesayrs in #1141
Remove unused/duplicated/non-applicable utils from pytorch/utils/helpers by @kylesayrs in #1174
[Bugfix] Staged 2of4 example by @kylesayrs in #1238
wandb/tensorboard loggers set default init to False by @brian-dellabetta in #1235
fixing reproducibility of lmeval tests by @brian-dellabetta in #1220
[Audio] People's Speech dataset and tracer tool by @kylesayrs in #1086
Use KV cache constant names provided by compressed tensors by @kylesayrs in #1200
[Bugfix] Raise error for processor remote code by @kylesayrs in #1184
Remove missing weights silencers in favor of HFQuantizer solution by @kylesayrs in #1017
Fix run_compressed tests by @dsikka in #1246
[Train] Training Pipeline by @horheynm in #1214
[Tests] Increase maximum quantization error by @kylesayrs in #1245
[Callbacks] Remove EventLifecycle and on_start event by @kylesayrs in #1170
[Bugfix] Disable generation of deepseek models with transformers>=4.48 by @kylesayrs in #1259
Remove clear_ml by @dsikka in #1261
[Tests] Remove clear_ml test from GHA by @kylesayrs in #1265
Remove click by @dsikka in #1262
[Bugfix] Remove constant pruning from 2of4 examples by @kylesayrs in #1267
Addback: ConstantPruningModifier for finetuning cases by @rahul-tuli in #1272
Remove docker by @kylesayrs in #1255
move failing mulitmodal lmeval tests to skipped folder by @brian-dellabetta in #1273
Replace tj-action/changed-files by @dbarbuzzi in #1270
[BugFix]: Sparse2of4 example sparsity-only case by @rahul-tuli in #1282
Revert "update" by @dsikka in #1296
Fix Multi-Context Manager Syntax for Python 3.9 Compatibility by @rahul-tuli in #1287
Revert "Fix Multi-Context Manager Syntax for Python 3.9 Compatibility… by @dsikka in #1300
[StageRunner] Stage Runner entrypoint and pipeline by @horheynm in #1202
Bump: Min python version to 3.9 by @rahul-tuli in #1288
Keep quantization enabled during calibration by @kylesayrs in #1299
[BugFix] TRL distillation bug fix by @horheynm in #1278
Update: Readme for fp8 support by @rahul-tuli in #1304
[GPTQ] Add inversion fallback by @kylesayrs in #1283
fix typo by @eldarkurtic in #1290
[Tests] Fix oneshot + finetune test by passing splits to oneshot by @kylesayrs in #1316
[Tests] Remove the compress entrypoint by @dsikka in #1317
Fix Multi-Context Manager Syntax for Python 3.9 Compatibility by @rahul-tuli in #1313
[BugFix] Directly Convert Modifiers to Recipe Instance by @rahul-tuli in #1271
bump version, tag ct by @dsikka in #1318

New Contributors

@aman2304 made their first contribution in #1213

Full Changelog: 0.4.1...0.5.0

Contributors

dbarbuzzi, mgoin, and 7 other contributors

Assets 4

20 Feb 13:21

dhuangnm

0.4.1

6a1ba3c

v0.4.1

What's Changed

Remove version by @dsikka in #1077
Require 'ready' label for transformers tests by @dbarbuzzi in #1079
GPTQModifier Nits and Code Clarity by @kylesayrs in #1068
Also run on pushes to main by @dbarbuzzi in #1083
VLM: Phi3 Vision Example by @kylesayrs in #1032
VLM: Qwen2_VL Example by @kylesayrs in #1027
Composability with sparse and quantization compressors by @rahul-tuli in #948
Remove TraceableMistralForCausalLM by @kylesayrs in #1052
[Fix Test Failure]: Propagate name change to test by @rahul-tuli in #1088
[Audio] Support Audio Datasets by @kylesayrs in #1085
[Test Fix] Add Quantization then finetune tests by @horheynm in #964
[Smoothquant] Phi3 Vision Mappings by @kylesayrs in #1089
[VLM] Multimodal Data Collator by @kylesayrs in #1087
VLM: Model Tracing Guide by @kylesayrs in #1030
Turn off 2:4 sparse compression until supported in vllm by @rahul-tuli in #1092
[Test Fix] Fix Consecutive oneshot by @horheynm in #971
[Bug Fix] Fix test that requre GPU by @horheynm in #1096
Add Idefics3/SmolVLM quant support via traceable class by @leon-seidel in #1095
Traceability Guide: Clarity and typo by @kylesayrs in #1099
[VLM] Examples README by @kylesayrs in #1057
Raise warning for 24 compressed sparse-only models by @rahul-tuli in #1107
Remove log_model_load by @kylesayrs in #1016
Return empty sparsity config if targets and ignores are empty by @rahul-tuli in #1115
Remove uses of get_observer by @kylesayrs in #939
FSDP utils cleanup by @kylesayrs in #854
Update maintainers, add notice by @kylesayrs in #1091
Replace readme paths with urls by @kylesayrs in #1097
GPTQ add Arkiv link, move file location by @kylesayrs in #1100
Extend remove_hooks to remove subsets by @kylesayrs in #1021
[Audio] Whisper Example and Readme by @kylesayrs in #1106
[Audio] Add whisper fp8 dynamic example by @kylesayrs in #1111
[VLM] Update pixtral data collator to reflect latest transformers changes by @kylesayrs in #1116
Use unique test names in TestvLLM by @dbarbuzzi in #1124
Remove smoothquant from examples by @kylesayrs in #1121
Extend disable_hooks to keep subsets by @kylesayrs in #1023
Unpin pynvml to fix e2e test failures with vLLM by @dsikka in #1125
Replace LayerCompressor with HooksMixin by @kylesayrs in #1038
[Oneshot Refactor] Rename get_shared_processor_src to get_processor_name_from_model by @horheynm in #1108
Allow Shortcutting Min-max Observer by @kylesayrs in #887
[Polish] Remove unused code by @horheynm in #1128
Properly restore training mode with eval_context by @kylesayrs in #1126
SQ and QM: Remove torch.cuda.empty_cache, use calibration_forward_context by @kylesayrs in #1114
[Oneshot Refactor] dataclass Arguments by @horheynm in #1103
[Bugfix] SparseGPT, Pipelines by @kylesayrs in #1130
[Oneshot refactor] Refactor initialize_model_from_path by @horheynm in #1109
[e2e] Update vllm tests with additional datasets by @brian-dellabetta in #1131
Update: SparseGPT recipes by @rahul-tuli in #1142
Add timer support for testing by @dsikka in #1137
[Audio] Support Whisper V3 by @kylesayrs in #1147
Fix: Re-enable Sparse Compression for 2of4 Examples by @rahul-tuli in #1153
[VLM] Add caption to flickr dataset by @kylesayrs in #1138
[VLM] Update mllama traceable definition by @kylesayrs in #1140
Fix CPU Offloading by @dsikka in #1159
[TRL_SFT_Trainer] Fix and Update Examples code by @horheynm in #1161
[TRL_SFT_Trainer] Fix TRL-SFT Distillation Training by @horheynm in #1163
Bump version for patch release by @dsikka in #1166
Update DeepSeek Examples by @dsikka in #1175
Update gemma2 examples with a note about sample generation by @dsikka in #1176

New Contributors

@leon-seidel made their first contribution in #1095

Full Changelog: 0.4.0...0.4.1

Contributors

dbarbuzzi, brian-dellabetta, and 5 other contributors

Assets 4

16 Jan 03:12

dhuangnm

0.4.0

829af5b

v0.4.0

What's Changed

Record config file name as test suite property by @dbarbuzzi in #947
Update setup.py by @dsikka in #975
Depreciate OBCQ Helpers by @kylesayrs in #977
KV Cache, E2E Tests by @horheynm in #742
Use 1 GPU for offloading examples by @dsikka in #979
Replace tokenizer with processor by @kylesayrs in #955
Revert "KV Cache, E2E Tests (#742)" by @dsikka in #989
Fix SmoothQuant offload bug by @dsikka in #978
Add LM Eval Configs by @dsikka in #980
Fix test_model_reload test by @kylesayrs in #1005
Calibration and Compression Contexts by @kylesayrs in #998
Add info for clarity by @dsikka in #1009
[Bugfix] Pass trust_remote_code_model=True for deepseek examples by @dsikka in #1012
Vision Datasets by @kylesayrs in #943
Add example for fp8 kv cache of phi3.5 and gemma2 by @mgoin in #991
Update ReadMe and test for cpu_offloading by @dsikka in #1013
Adding amdsmi for AMD gpus by @citrix123 in #1018
CompressionLogger add time units by @kylesayrs in #1026
patch_tied_tensors_bug: support malformed model definitions by @kylesayrs in #1014
Add: 2of4 example with/without fp8 quantization by @rahul-tuli in #1033
Remove unccessary step in 2of4 Example by @dsikka in #1034
Remove Neural Magic copyright from files by @kylesayrs in #992
VLM Support via GPTQ Hooks and Data Pipelines by @kylesayrs in #914
[E2E Testing] KV-Cache by @horheynm in #1004
[E2E Testing] Add recipe check vllm e2e by @horheynm in #929
[MoE] GPTQ compress using callback not hook by @kylesayrs in #1049
Explicit dataset tokenizer text kwarg by @kylesayrs in #1031
Fix smoothquant ignore, Fix typing, Add glm mappings by @kylesayrs in #1015
[Test Fix] Quant model reload by @horheynm in #974
Remove old examples by @dsikka in #1062
VLM: Fix typo bug in TraceableLlavaForConditionalGeneration by @kylesayrs in #1065
Add tests for "examples/sparse_2of4_[...]" by @dbarbuzzi in #1067
VLM Image Examples by @kylesayrs in #1064
Add quick warning for DeepSeek with transformers 4.48.0 by @dsikka in #1066
[KV Cache] kv-cache end to end unit tests by @horheynm in #141
[E2E Testing] Fix HF upload by @horheynm in #1061
[Test Fix] Fix/update test_run_compressed by @horheynm in #970
Revert "[Test Fix] Fix/update test_run_compressed" by @mgoin in #1071
Sparse 2:4 + FP8 Quantization e2e vLLM tests by @dsikka in #1073
[Test Patch] Remove redundant code for "Fix/update test_run_compressed" by @horheynm in #1072
bump; set ct version by @dsikka in #1076

New Contributors

@citrix123 made their first contribution in #1018

Full Changelog: 0.3.1...0.4.0

Contributors

dbarbuzzi, mgoin, and 5 other contributors

Assets 4

12 Dec 13:25

dhuangnm

0.3.1

c3608a0

v0.3.1

What's Changed

BLOOM Default Smoothquant Mappings by @kylesayrs in #906
[SparseAutoModelForCausalLM Deprecation] Feature change by @horheynm in #881
Correct "dyanmic" typo by @kylesayrs in #888
Explicit defaults for QuantizationModifier targets by @kylesayrs in #889
[SparseAutoModelForCausalLM Deprecation] Update examples by @horheynm in #880
Support pack_quantized format for nonuniform mixed-precision by @mgoin in #913
Actually make the run_compressed test useful by @dsikka in #920
Fix for e2e tests by @horheynm in #927
[Bugfix] Correct metrics calculations by @kylesayrs in #878
Update kv_cache example by @dsikka in #921
[1/2] Expand e2e testing to prepare for lm-eval by @dsikka in #922
Update pytest command to capture results to file by @dbarbuzzi in #932
[Bugfix] DisableKVCache Context by @kylesayrs in #834
Add helpful info to the marlin-24 example by @dsikka in #946
Remove requires_torch by @kylesayrs in #949
Remove unused sparseml.export utilities by @kylesayrs in #950
Implement HooksMixin by @kylesayrs in #917
Add LM Eval Testing by @dsikka in #945
update version by @dsikka in #969

Full Changelog: 0.3.0...0.3.1

Contributors

dbarbuzzi, mgoin, and 3 other contributors

Assets 4

13 Nov 05:22

dhuangnm

0.3.0

93832a6

v0.3.0

What's New in v0.3.0

Key Features and Improvements

GPTQ Quantized-weight Sequential Updating (#177): Introduced an efficient sequential updating mechanism for GPTQ quantization, improving model compression performance and compatibility.
Auto-Infer Mappings for SmoothQuantModifier (#119): Automatically infers mappings based on model architecture, making SmoothQuant easier to apply across various models.
Improved Sparse Compression Usability (#191): Added support for targeted sparse compression with specific ignore rules during inference, allowing for more flexible model configurations.
Generic Wrapper for Any Hugging Face Model (#185): Added wrap_hf_model_class utility, enabling better support and integration for Hugging Face models i.e. not based on AutoModelForCausalLM.
Observer Restructure (#837): Introduced calibration and frozen steps within QuantizationModifier, moving Observers from compressed-tensors to llm-compressor.

Bug Fixes

Fix Tied Tensors Bug (#659)
Observer Initialization in GPTQ Wrapper (#883)
Sparsity Reload Testing (#882)

Documentation

Updated SmoothQuant Tutorial (#115): Expanded SmoothQuant documentation to include detailed mappings for easier implementation.

What's Changed

Fix compresed typo by @kylesayrs in #188
GPTQ Quantized-weight Sequential Updating by @kylesayrs in #177
Add: targets and ignore inference for sparse compression by @rahul-tuli in #191
switch tests from weekly to nightly by @dhuangnm in #658
Compression wrapper abstract methods by @kylesayrs in #170
Explicitly set sequential_update in examples by @kylesayrs in #187
Increase Sparsity Threshold for compressors by @rahul-tuli in #679
Add a generic wrap_hf_model_class utility to support VLMs by @mgoin in #185
Add tests for examples by @dbarbuzzi in #149
Rename to quantization config by @kylesayrs in #730
Implement Missing Modifier Methods by @kylesayrs in #166
Fix 2/4 GPTQ Model Tests by @dsikka in #769
SmoothQuant mappings tutorial by @rahul-tuli in #115
Fix import of ModelCompressor by @rahul-tuli in #776
update test by @dsikka in #773
[Bugfix] Fix saving offloaded state dict by @kylesayrs in #172
Auto-Infer mappings Argument for SmoothQuantModifier Based on Model Architecture by @rahul-tuli in #119
Update workflows/actions by @dbarbuzzi in #774
[Bugfix] Prepare KD Models when Saving by @kylesayrs in #174
Set Sparse compression to save_compressed by @rahul-tuli in #821
Install compressed-tensors after llm-compressor by @dbarbuzzi in #825
Fix test typo by @kylesayrs in #828
Add AutoModelForCausalLM example by @dsikka in #698
[Bugfix] Workaround tied tensors bug by @kylesayrs in #659
Only untie word embeddings by @kylesayrs in #839
Check for config hidden size by @kylesayrs in #840
Use float32 for Hessian dtype by @kylesayrs in #847
GPTQ: Depreciate non-sequential update option by @kylesayrs in #762
Typehint nits by @kylesayrs in #826
[ DOC ] Remove version restrictions in W8A8 exmaple by @miaojinc in #849
Fix inconsistence in example config of 2:4 sparse quantization by @yzlnew in #80
Fix forward function pass call by @dsikka in #845
[Bugfix] Use weight parameter of linear layer by @kylesayrs in #836
[Bugfix] Rename files to remove colons by @kylesayrs in #846
cover all 3.9-3.12 in commit testing by @dhuangnm in #864
Add marlin-24 recipe/configs for e2e testing by @dsikka in #866
[Bugfix] onload during sparsity calculation by @kylesayrs in #862
Fix HFTrainer overloads by @kylesayrs in #869
Support Model Offloading Tied Tensors Patch by @kylesayrs in #872
Add advice about dealing with non-invertable hessians by @kylesayrs in #875
seed commit workflow by @andy-neuma in #877
[Observer Restructure]: Add Observers; Add calibration and frozen steps to QuantizationModifier by @dsikka in #837
Bugfix observer initialization in gptq_wrapper by @rahul-tuli in #883
BugFix: Fix Sparsity Reload Testing by @dsikka in #882
Use custom unique test names for e2e tests by @dbarbuzzi in #892
Revert "Use custom unique test names for e2e tests (#892)" by @dsikka in #893
Move config["testconfig_path"] assignment by @dbarbuzzi in #895
Cap accelerate version to avoid bug by @kylesayrs in #897
Fix observing offloaded weight by @kylesayrs in #896
Update image in README.md by @mgoin in #861
update accelerate version by @kylesayrs in #899
[GPTQ] Iterative Parameter Updating by @kylesayrs in #863
Small fixes for release by @dsikka in #901
use smaller portion of dataset by @dsikka in #902
Update example to not fail hessian inversion by @dsikka in #904
Bump version to 0.3.0 by @dsikka in #907

New Contributors

@miaojinc made their first contribution in #849
@yzlnew made their first contribution in #80
@andy-neuma made their first contribution in #877

Full Changelog: 0.2.0...0.3.0

Contributors

dbarbuzzi, mgoin, and 7 other contributors

Assets 4

23 Sep 22:24

dhuangnm

0.2.0

2e0035f

v0.2.0

What's Changed

Correct Typo in SparseAutoModelForCausalLM docstring by @kylesayrs in #56
Disable Default Bitmask Compression by @Satrat in #60
TRL Example fix by @rahul-tuli in #59
Fix typo by @rahul-tuli in #63
Correct typo by @kylesayrs in #61
correct import in README.md by @zzc0430 in #66
Fix for issue #43 -- starcoder model by @horheynm in #71
Update README.md by @robertgshaw2-neuralmagic in #74
Layer by Layer Sequential GPTQ Updates by @Satrat in #47
[ Docs ] Update main readme by @robertgshaw2-neuralmagic in #77
[ Docs ] gemma2 examples by @robertgshaw2-neuralmagic in #78
[ Docs ] Update FP8 example to use dynamic per token by @robertgshaw2-neuralmagic in #75
[ Docs ] Overhaul accelerate user guide by @robertgshaw2-neuralmagic in #76
Support kv_cache_scheme for quantizing KV Cache by @mgoin in #88
Propagate trust_remote_code Argument by @kylesayrs in #90
Fix for issue #81 by @horheynm in #84
Fix for issue 83 by @horheynm in #85
[ DOC ] Big Model Example by @robertgshaw2-neuralmagic in #99
Enable obcq/finetune integration tests with commit cadence by @dsikka in #101
metric logging on GPTQ path by @horheynm in #65
Update test config files by @dsikka in #97
remove workflows + update runners by @dsikka in #103
metrics by @horheynm in #104
add debug by @horheynm in #108
Add FP8 KV Cache quant example by @mgoin in #113
Add vLLM e2e tests by @dsikka in #117
Fix style, fix noqa by @kylesayrs in #123
GPTQ Algorithm Cleanup by @kylesayrs in #120
GPTQ Activation Ordering by @kylesayrs in #94
demote recipe string initialization to debug and make more descriptive by @kylesayrs in #116
compressed-tensors main dependency for base-tests by @kylesayrs in #125
Set ready label for transformer tests; add message reminder on PR opened by @dsikka in #126
Fix markdown check test by @dsikka in #127
Naive Run Compressed Pt. 2 by @Satrat in #62
Fix transformer test conditions by @dsikka in #131
Run Compressed Tests by @Satrat in #132
Correct typo by @kylesayrs in #124
Activation Ordering Strategies by @kylesayrs in #121
Fix README Issue by @robertgshaw2-neuralmagic in #139
update by @dsikka in #143
Update finetune and oneshot tests by @dsikka in #114
Validate Recipe Parsing Output by @kylesayrs in #100
fix build error for nightly by @dhuangnm in #145
Fix recipe nested in configs by @kylesayrs in #140
MOE example with warning by @rahul-tuli in #87
Bug Fix: recipe stages were not being concatenated by @rahul-tuli in #150
fix package name bug for nightly by @dhuangnm in #155
Add descriptions for pytest marks by @kylesayrs in #156
Fix Sparsity Unit Test by @Satrat in #153
Fix: Error during model saving with shared tensors by @rahul-tuli in #158
Update 2:4 Examples by @dsikka in #161
DeepSeek: Fix Hessian Estimation by @Satrat in #157
bump up main to 0.2.0 by @dhuangnm in #163
Fix help dialogue by @kylesayrs in #151
Add MoE and Compressed Inference Examples by @Satrat in #160
Separate trust_remote_code args by @kylesayrs in #152
Enable a skipped finetune test by @dsikka in #169
Fix filename in example command by @dbarbuzzi in #173
Add DeepSeek V2.5 Example by @dsikka in #171
fix quality by @dsikka in #176
Patch log function name in gptq by @kylesayrs in #168
README for Modifiers by @Satrat in #165
Fix default for sequential updates by @dsikka in #186
fix default test case by @dsikka in #193
Fix Initalize typo by @Imss27 in #190
Update MoE examples by @mgoin in #192

New Contributors

@zzc0430 made their first contribution in #66
@horheynm made their first contribution in #71
@dsikka made their first contribution in #101
@dhuangnm made their first contribution in #145
@Imss27 made their first contribution in #190

Full Changelog: 0.1.0...0.2.0

Contributors

dbarbuzzi, mgoin, and 9 other contributors

Assets 4

Releases: vllm-project/llm-compressor

v0.6.0.1

What's Changed

Contributors

Uh oh!

v0.6.0

What's Changed

New Contributors

Contributors

Uh oh!

v0.5.2

What's Changed

Contributors

Uh oh!

v0.5.1

What's Changed

New Contributors

Contributors

Uh oh!

v0.5.0

What's Changed

New Contributors

Contributors

Uh oh!

v0.4.1

What's Changed

New Contributors

Contributors

Uh oh!

v0.4.0

What's Changed

New Contributors

Contributors

Uh oh!

v0.3.1

What's Changed

Contributors

Uh oh!

v0.3.0

What's New in v0.3.0

Key Features and Improvements

Bug Fixes

Documentation

What's Changed

New Contributors

Contributors

Uh oh!

v0.2.0

What's Changed

New Contributors

Contributors

Uh oh!