VLM: Model Tracing Guide #1030

kylesayrs · 2025-01-02T23:49:44Z

Purpose

This guide explains the concepts of tracing as they relate to LLM Compressor and how to modify your model to support recipes which require using the Sequential Pipeline.

Through reading this guide, you will learn

Why tracing is required when compressing with recipes involving the Sequential Pipeline and modifiers such as GPTQModifier
How to determine if your model is traceable for your dataset
How to modify your model definition to be traceable

Prerequisites

Explicit dataset tokenizer text kwarg #1031

Changes

Add a model tracing guide src/llmcompressor/transformers/tracing/README.md with pictures
Add a readme for the sequential pipeline which points to the Tracing Guide src/llmcompressor/pipelines/sequential/README.md
Add a debug script to help users debug their models for traceability src/llmcompressor/transformers/tracing/debug.py
- Add the llm-compressor.attempt_trace entrypoint for ease of use
Swap the order of arguments in llava_example.py and and pixtral_example.py to match the order of arguments on the modifier

Testing

Use the llmcompressor.trace debug script

llmcompressor.trace \
    --model_id llava-hf/llava-1.5-7b-hf
    --model_class TraceableLlavaForConditionalGeneration
    --sequential-targets LlamaDecoderLayer
    --ignore "re:.*lm_head" "re:vision_tower.*" "re:multi_modal_projector.*"
    --modality vision

Stretch

It might be nice if this tracing debugger tool also printed the model graph to an svg

Signed-off-by: Kyle Sayers <[email protected]>

…tokenized datasets should not be given labels Signed-off-by: Kyle Sayers <[email protected]>

Signed-off-by: Kyle Sayers <[email protected]>

…ataset

Signed-off-by: Kyle Sayers <[email protected]>

…anup-custom-dataset

Signed-off-by: Kyle Sayers <[email protected]>

…anup-custom-dataset

…artition

Signed-off-by: Kyle Sayers <[email protected]>

…anup-custom-dataset

Signed-off-by: Kyle Sayers <[email protected]>

…artition

Signed-off-by: Kyle Sayers <[email protected]>

## Purpose ## * Allow VLM processors to be used to tokenize datasets with prompt keys ## Postrequisites ## * #1030 ## Changes ## * Use `text` argument name for tokenizing the prompt column ## Testing ## * w.r.t. tokenizers, using the `text` kwarg follows the precedent set by [PretrainedTokenizerBase](https://github.com/huggingface/transformers/blob/main/src/transformers/tokenization_utils_base.py#L2790) * w.r.t. processors, most processors use the text kwarg Below are all the models I know to be compatible with this change, I'm assuming that most other processors follow the same standard 1. [llama](https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/tokenization_llama.py#L233) 2. [pixtral](https://github.com/huggingface/transformers/blob/main/src/transformers/models/pixtral/processing_pixtral.py#L160) 3. [phi3_vision](https://huggingface.co/microsoft/Phi-3.5-vision-instruct/blob/main/processing_phi3_v.py#L321) 4. [mllama](https://github.com/huggingface/transformers/blob/main/src/transformers/models/mllama/processing_mllama.py#L232) 5. [qwen2_vl](https://github.com/huggingface/transformers/blob/main/src/transformers/models/qwen2_vl/processing_qwen2_vl.py#L71) Example of using VLM processor to tokenize a dataset with prompt key ```python3 from transformers import AutoProcessor from llmcompressor.transformers import DataTrainingArguments, TextGenerationDataset models_to_test = [ "meta-llama/Meta-Llama-3-8B-Instruct", "mistralai/Mixtral-8x7B-Instruct-v0.1", "Qwen/Qwen2-VL-2B-Instruct", # fails without changes "mgoin/pixtral-12b", # fails without changes ] for model_id in models_to_test: processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True) data_args = DataTrainingArguments( dataset="ultrachat-200k", splits={"calibration": "test_sft[:1]"} ) dataset = TextGenerationDataset.load_from_registry( data_args.dataset, data_args=data_args, split=data_args.splits["calibration"], processor=processor, )(add_labels=False) ``` Signed-off-by: Kyle Sayers <[email protected]>

Signed-off-by: Kyle Sayers <[email protected]>

mgoin

Great work, we should consider adding a readthedoc build like vLLM to render these out

src/llmcompressor/pipelines/sequential/README.md

Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Michael Goin <[email protected]>

dsikka

Great job.

A couple of nits:

I wouldnt refer to the SparseGPTModifier until we've actually started using data pipelines outside of the GPTQModifier
A helpful comment on what to focus on when looking at the images would be nice

src/llmcompressor/pipelines/sequential/README.md

src/llmcompressor/transformers/tracing/GUIDE.md

src/llmcompressor/transformers/tracing/debug.py

Signed-off-by: Kyle Sayers <[email protected]>

## Purpose ## * Create a landing page for those looking to use VLMs * Advertise VLM support on homepage ## Prerequisites ## * #1030 --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Brian Dellabetta <[email protected]>

## Purpose ## * Create a landing page for those looking to use VLMs * Advertise VLM support on homepage ## Prerequisites ## * #1030 --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Brian Dellabetta <[email protected]> Signed-off-by: Rahul Tuli <[email protected]>

@rahul-tuli

## Purpose ## * Remove layer compressor to decouple modifiers from data pipelines * Reduce abstractions * Support VLMs with SparseGPT and Wanda ## Prerequisites ## * #1021 * #1023 * #1068 * #1030 ## Changes ## ### Interface/ Features ### * SparseGPT and Wanda now both support VLM architectures * Added `sequential_targets` to match GPTQ and made `targets` an alias * Support hessian offloading for `SparseGPT` * Add customized `_LinAlgError` for `SparseGPT` ### Implementations ### * Changed implementation styles of `SparseGPTModifier` and `WandaPruningModifier` to match `GPTQModifier` * Removed `LayerCompressor`, `ModuleCompressionWrapper`, `SparseGptWrapper`, and `WandaWrapper` * Shared implementations between SparseGPT and Wanda are implemented by the `SparsityModifierMixin` * Removed lines blocking `allow_tf32` * Maybe @rahul-tuli knows why this was originally implemented, potentially to avoid hardware issues? * This change was only present for wanda. Given that all other modifiers do not have this change, I see no reason why it should stay * Updated sparsegpt tests to reflect new implementation ### Tests ### * Updated obcq tests to reflect new implementations * Removed `test_sgpt_defaults.py` since this test doesn't test anything new or novel about this modifier ## Testing ## * `grep -r "LayerCompressor\|ModuleCompressionWrapper\|SparseGptWrapper\|WandaWrapper" src/ examples/ tests/` * Modified `test_invalid_layerwise_recipes_raise_exceptions` and `test_successful_layerwise_recipe` pass * `llama3_8b_2of4.py` passes and was evaluated with both SparseGPT and Wanda ## Potential Follow ups ## * Add module `targets` and `ignore` to SparseGPT and Wanda ## Regression Testing ## The hessian, row scalar, and compressed weight values were confirmed to be unchanged in the case that of one calibration sample. The final evaluations are different, which is likely due to numerical imprecision (dividing by int vs torch.int), different pipelines (different subgraph partitions => different imprecision from cpu offloading, potentially different module arguments). ### Evaluation Models were compressed using `examples/sparse_2of4_quantization_fp8/llama3_8b_2of4.py` <details><summary>sparsegpt</summary> Main ``` hf (pretrained=/home/ksayers/llm-compressor/old_Llama-3.2-1B-Instruct2of4-sparse,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1 | Tasks |Version|Filter|n-shot|Metric| |Value | |Stderr| |----------|------:|------|-----:|------|---|-----:|---|-----:| |winogrande| 1|none | 5|acc |? |0.5391|? | 0.014| ``` Branch ``` hf (pretrained=/home/ksayers/llm-compressor/new_Llama-3.2-1B-Instruct2of4-sparse,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1 | Tasks |Version|Filter|n-shot|Metric| |Value| |Stderr| |----------|------:|------|-----:|------|---|----:|---|-----:| |winogrande| 1|none | 5|acc |? |0.547|? | 0.014| ``` </details> To test wanda, the `SparseGPTModifier` was replaced with the `WandaPruningModifier` <details><summary>wanda</summary> Main ``` hf (pretrained=/home/kyle/old_llm-compressor/Llama-3.2-1B-Instruct2of4-sparse,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1 | Tasks |Version|Filter|n-shot|Metric| |Value| |Stderr| |----------|------:|------|-----:|------|---|----:|---|-----:| |winogrande| 1|none | 5|acc |↑ |0.532|± | 0.014| ``` Branch ``` hf (pretrained=/home/kyle/llm-compressor/Llama-3.2-1B-Instruct2of4-sparse,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1 | Tasks |Version|Filter|n-shot|Metric| |Value | |Stderr| |----------|------:|------|-----:|------|---|-----:|---|-----:| |winogrande| 1|none | 5|acc |↑ |0.5414|± | 0.014| ``` </details> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]>

## Purpose ## * Provide a predefined audio dataset for * Testing traceability of audio models * e2e tests with audio models * Simpler examples (blog) ## Prerequisites ## * #1030 * #1085 ## Changes ## * Implement `PeoplesSpeech` dataset * Because of the more complex nature of audio processors, this dataset needs to hardcode some processing logic specific to models * Assumes that most processing is similar to whisper processing, which seems to be the standard * Because processing changes depending on the model, this means mapped outputs cannot be cached * Add `load_from_cache_file` argument to preprocessing mapping (this was overlooked before) * Integrate dataset with tracing debugger tool ## Testing ## ```bash llmcompressor.trace \ --model_id openai/whisper-large-v2\ --model_class TraceableWhisperForConditionalGeneration\ --modality audio ``` Traceable definition of qwen2_audio is not finished yet, but this loads and is accepted as valid input ```bash llmcompressor.trace \ --model_id Qwen/Qwen2-Audio-7B\ --model_class Qwen2AudioForConditionalGeneration\ --modality audio ``` --------- Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs added 30 commits November 26, 2024 04:35

preliminary data pipeline

3830696

WIP

1ecaa39

delete unnecessary files

9aa9679

Merge remote-tracking branch 'origin' into kylesayrs/gptq-partition

7e6fe17

Merge branch 'kylesayrs/gptq-hooks' into kylesayrs/gptq-partition

034c0b1

clean up CustomDataset

a62617c

Signed-off-by: Kyle Sayers <[email protected]>

chchchchanges

57b5e02

Signed-off-by: Kyle Sayers <[email protected]>

wip: use rename to processor, going through tests

fa317fd

Signed-off-by: Kyle Sayers <[email protected]>

remove labels from calibration dataset rather than assuming that all …

f3f5875

…tokenized datasets should not be given labels Signed-off-by: Kyle Sayers <[email protected]>

cleanup

58c3afe

Signed-off-by: Kyle Sayers <[email protected]>

cleanup, etc

72aecfc

Signed-off-by: Kyle Sayers <[email protected]>

Merge remote-tracking branch 'origin' into kylesayrs/cleanup-custom-d…

77217fb

…ataset

fix typehinting

4461a3e

Signed-off-by: Kyle Sayers <[email protected]>

add typechecking imports

fb33001

Signed-off-by: Kyle Sayers <[email protected]>

remove sparseml utilities

bf4744a

Signed-off-by: Kyle Sayers <[email protected]>

Merge branch 'kylesayrs/remove-sparseml-utilities' into kylesayrs/cle…

62ae31d

…anup-custom-dataset

use in model_load

7e516c1

Signed-off-by: Kyle Sayers <[email protected]>

Merge branch 'main' into kylesayrs/calculate_offload_default_gpus

d69106e

remove use of RECIPE FILE NAME

9e33641

Signed-off-by: Kyle Sayers <[email protected]>

rename to RECIPE_FILE_NAME, avoid circular import

58c0fba

Signed-off-by: Kyle Sayers <[email protected]>

Merge branch 'kylesayrs/remove-sparseml-utilities' into kylesayrs/cle…

b28aaae

…anup-custom-dataset

image dataset collation

8d13013

Merge branch 'kylesayrs/cleanup-custom-dataset' into kylesayrs/gptq-p…

17cf9f3

…artition

cleanup, do not handle case where processor is None

163ee8f

Signed-off-by: Kyle Sayers <[email protected]>

remove qa ignore

1180b34

Signed-off-by: Kyle Sayers <[email protected]>

Merge branch 'kylesayrs/remove-sparseml-utilities' into kylesayrs/cle…

ad20ae7

…anup-custom-dataset

add documentation

c431958

Signed-off-by: Kyle Sayers <[email protected]>

add data collator arg

b48d55d

Signed-off-by: Kyle Sayers <[email protected]>

Merge branch 'kylesayrs/cleanup-custom-dataset' into kylesayrs/gptq-p…

2d201e0

…artition

use default factor

0ed5c2c

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs added 5 commits January 13, 2025 19:28

partial derivatives are not alphanumeric

feeb67e

Signed-off-by: Kyle Sayers <[email protected]>

rename attempt_trace to trace

d6441f5

Signed-off-by: Kyle Sayers <[email protected]>

Merge remote-tracking branch 'origin' into kylesayrs/traceability-readme

3bd3ca7

Merge branch 'main' into kylesayrs/traceability-readme

bb7ca2e

rename to guide, link to guide in warning

08fad5d

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs added 3 commits January 19, 2025 06:56

typos

5f23f52

Signed-off-by: Kyle Sayers <[email protected]>

typo

6c71263

Signed-off-by: Kyle Sayers <[email protected]>

add summary

08f9f79

Signed-off-by: Kyle Sayers <[email protected]>

mgoin previously approved these changes Jan 20, 2025

View reviewed changes

src/llmcompressor/pipelines/sequential/README.md Outdated Show resolved Hide resolved

Update src/llmcompressor/pipelines/sequential/README.md

9e6ceb8

Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Michael Goin <[email protected]>

kylesayrs dismissed mgoin’s stale review via 9e6ceb8 January 20, 2025 21:01

kylesayrs added the ready When a PR is ready for review label Jan 20, 2025

kylesayrs mentioned this pull request Jan 20, 2025

[Audio] People's Speech dataset and tracer tool #1086

Merged

Merge branch 'main' into kylesayrs/traceability-readme

7536b7d

kylesayrs mentioned this pull request Jan 20, 2025

[VLM] Examples README #1057

Merged

dsikka reviewed Jan 23, 2025

View reviewed changes

dsikka and others added 4 commits January 22, 2025 21:24

Merge branch 'main' into kylesayrs/traceability-readme

32dd0e3

Merge branch 'main' into kylesayrs/traceability-readme

68586cb

use modality kwarg

f3d9162

Signed-off-by: Kyle Sayers <[email protected]>

add image descriptions, fix typos

5547e98

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs mentioned this pull request Jan 23, 2025

Replace LayerCompressor with HooksMixin #1038

Merged

remove mention of sgpt until those changes land

c8659ef

Signed-off-by: Kyle Sayers <[email protected]>

dsikka approved these changes Jan 23, 2025

View reviewed changes

dsikka merged commit e48d9db into main Jan 23, 2025
6 of 7 checks passed

dsikka deleted the kylesayrs/traceability-readme branch January 23, 2025 17:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VLM: Model Tracing Guide #1030

VLM: Model Tracing Guide #1030

kylesayrs commented Jan 2, 2025 •

edited

Loading

mgoin left a comment

dsikka left a comment

VLM: Model Tracing Guide #1030

VLM: Model Tracing Guide #1030

Conversation

kylesayrs commented Jan 2, 2025 • edited Loading

Purpose

Prerequisites

Changes

Testing

Stretch

mgoin left a comment

Choose a reason for hiding this comment

dsikka left a comment

Choose a reason for hiding this comment

kylesayrs commented Jan 2, 2025 •

edited

Loading