-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VLM: Model Tracing Guide #1030
Merged
Merged
VLM: Model Tracing Guide #1030
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
…tokenized datasets should not be given labels Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
…anup-custom-dataset
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
…anup-custom-dataset
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
…anup-custom-dataset
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
kylesayrs
added a commit
that referenced
this pull request
Jan 15, 2025
## Purpose ## * Allow VLM processors to be used to tokenize datasets with prompt keys ## Postrequisites ## * #1030 ## Changes ## * Use `text` argument name for tokenizing the prompt column ## Testing ## * w.r.t. tokenizers, using the `text` kwarg follows the precedent set by [PretrainedTokenizerBase](https://github.com/huggingface/transformers/blob/main/src/transformers/tokenization_utils_base.py#L2790) * w.r.t. processors, most processors use the text kwarg Below are all the models I know to be compatible with this change, I'm assuming that most other processors follow the same standard 1. [llama](https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/tokenization_llama.py#L233) 2. [pixtral](https://github.com/huggingface/transformers/blob/main/src/transformers/models/pixtral/processing_pixtral.py#L160) 3. [phi3_vision](https://huggingface.co/microsoft/Phi-3.5-vision-instruct/blob/main/processing_phi3_v.py#L321) 4. [mllama](https://github.com/huggingface/transformers/blob/main/src/transformers/models/mllama/processing_mllama.py#L232) 5. [qwen2_vl](https://github.com/huggingface/transformers/blob/main/src/transformers/models/qwen2_vl/processing_qwen2_vl.py#L71) Example of using VLM processor to tokenize a dataset with prompt key ```python3 from transformers import AutoProcessor from llmcompressor.transformers import DataTrainingArguments, TextGenerationDataset models_to_test = [ "meta-llama/Meta-Llama-3-8B-Instruct", "mistralai/Mixtral-8x7B-Instruct-v0.1", "Qwen/Qwen2-VL-2B-Instruct", # fails without changes "mgoin/pixtral-12b", # fails without changes ] for model_id in models_to_test: processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True) data_args = DataTrainingArguments( dataset="ultrachat-200k", splits={"calibration": "test_sft[:1]"} ) dataset = TextGenerationDataset.load_from_registry( data_args.dataset, data_args=data_args, split=data_args.splits["calibration"], processor=processor, )(add_labels=False) ``` Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
mgoin
previously approved these changes
Jan 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, we should consider adding a readthedoc build like vLLM to render these out
Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Michael Goin <[email protected]>
Merged
dsikka
reviewed
Jan 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job.
A couple of nits:
- I wouldnt refer to the SparseGPTModifier until we've actually started using data pipelines outside of the GPTQModifier
- A helpful comment on what to focus on when looking at the images would be nice
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
dsikka
approved these changes
Jan 23, 2025
dsikka
pushed a commit
that referenced
this pull request
Jan 27, 2025
## Purpose ## * Create a landing page for those looking to use VLMs * Advertise VLM support on homepage ## Prerequisites ## * #1030 --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Brian Dellabetta <[email protected]>
rahul-tuli
pushed a commit
that referenced
this pull request
Jan 28, 2025
## Purpose ## * Create a landing page for those looking to use VLMs * Advertise VLM support on homepage ## Prerequisites ## * #1030 --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Brian Dellabetta <[email protected]> Signed-off-by: Rahul Tuli <[email protected]>
dsikka
added a commit
that referenced
this pull request
Feb 5, 2025
## Purpose ## * Remove layer compressor to decouple modifiers from data pipelines * Reduce abstractions * Support VLMs with SparseGPT and Wanda ## Prerequisites ## * #1021 * #1023 * #1068 * #1030 ## Changes ## ### Interface/ Features ### * SparseGPT and Wanda now both support VLM architectures * Added `sequential_targets` to match GPTQ and made `targets` an alias * Support hessian offloading for `SparseGPT` * Add customized `_LinAlgError` for `SparseGPT` ### Implementations ### * Changed implementation styles of `SparseGPTModifier` and `WandaPruningModifier` to match `GPTQModifier` * Removed `LayerCompressor`, `ModuleCompressionWrapper`, `SparseGptWrapper`, and `WandaWrapper` * Shared implementations between SparseGPT and Wanda are implemented by the `SparsityModifierMixin` * Removed lines blocking `allow_tf32` * Maybe @rahul-tuli knows why this was originally implemented, potentially to avoid hardware issues? * This change was only present for wanda. Given that all other modifiers do not have this change, I see no reason why it should stay * Updated sparsegpt tests to reflect new implementation ### Tests ### * Updated obcq tests to reflect new implementations * Removed `test_sgpt_defaults.py` since this test doesn't test anything new or novel about this modifier ## Testing ## * `grep -r "LayerCompressor\|ModuleCompressionWrapper\|SparseGptWrapper\|WandaWrapper" src/ examples/ tests/` * Modified `test_invalid_layerwise_recipes_raise_exceptions` and `test_successful_layerwise_recipe` pass * `llama3_8b_2of4.py` passes and was evaluated with both SparseGPT and Wanda ## Potential Follow ups ## * Add module `targets` and `ignore` to SparseGPT and Wanda ## Regression Testing ## The hessian, row scalar, and compressed weight values were confirmed to be unchanged in the case that of one calibration sample. The final evaluations are different, which is likely due to numerical imprecision (dividing by int vs torch.int), different pipelines (different subgraph partitions => different imprecision from cpu offloading, potentially different module arguments). ### Evaluation Models were compressed using `examples/sparse_2of4_quantization_fp8/llama3_8b_2of4.py` <details><summary>sparsegpt</summary> Main ``` hf (pretrained=/home/ksayers/llm-compressor/old_Llama-3.2-1B-Instruct2of4-sparse,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1 | Tasks |Version|Filter|n-shot|Metric| |Value | |Stderr| |----------|------:|------|-----:|------|---|-----:|---|-----:| |winogrande| 1|none | 5|acc |? |0.5391|? | 0.014| ``` Branch ``` hf (pretrained=/home/ksayers/llm-compressor/new_Llama-3.2-1B-Instruct2of4-sparse,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1 | Tasks |Version|Filter|n-shot|Metric| |Value| |Stderr| |----------|------:|------|-----:|------|---|----:|---|-----:| |winogrande| 1|none | 5|acc |? |0.547|? | 0.014| ``` </details> To test wanda, the `SparseGPTModifier` was replaced with the `WandaPruningModifier` <details><summary>wanda</summary> Main ``` hf (pretrained=/home/kyle/old_llm-compressor/Llama-3.2-1B-Instruct2of4-sparse,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1 | Tasks |Version|Filter|n-shot|Metric| |Value| |Stderr| |----------|------:|------|-----:|------|---|----:|---|-----:| |winogrande| 1|none | 5|acc |↑ |0.532|± | 0.014| ``` Branch ``` hf (pretrained=/home/kyle/llm-compressor/Llama-3.2-1B-Instruct2of4-sparse,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1 | Tasks |Version|Filter|n-shot|Metric| |Value | |Stderr| |----------|------:|------|-----:|------|---|-----:|---|-----:| |winogrande| 1|none | 5|acc |↑ |0.5414|± | 0.014| ``` </details> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]>
kylesayrs
added a commit
that referenced
this pull request
Mar 11, 2025
## Purpose ## * Provide a predefined audio dataset for * Testing traceability of audio models * e2e tests with audio models * Simpler examples (blog) ## Prerequisites ## * #1030 * #1085 ## Changes ## * Implement `PeoplesSpeech` dataset * Because of the more complex nature of audio processors, this dataset needs to hardcode some processing logic specific to models * Assumes that most processing is similar to whisper processing, which seems to be the standard * Because processing changes depending on the model, this means mapped outputs cannot be cached * Add `load_from_cache_file` argument to preprocessing mapping (this was overlooked before) * Integrate dataset with tracing debugger tool ## Testing ## ```bash llmcompressor.trace \ --model_id openai/whisper-large-v2\ --model_class TraceableWhisperForConditionalGeneration\ --modality audio ``` Traceable definition of qwen2_audio is not finished yet, but this loads and is accepted as valid input ```bash llmcompressor.trace \ --model_id Qwen/Qwen2-Audio-7B\ --model_class Qwen2AudioForConditionalGeneration\ --modality audio ``` --------- Signed-off-by: Kyle Sayers <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Purpose
This guide explains the concepts of tracing as they relate to LLM Compressor and how to modify your model to support recipes which require using the Sequential Pipeline.
Through reading this guide, you will learn
Prerequisites
text
kwarg #1031Changes
src/llmcompressor/transformers/tracing/README.md
with picturessrc/llmcompressor/pipelines/sequential/README.md
src/llmcompressor/transformers/tracing/debug.py
llm-compressor.attempt_trace
entrypoint for ease of usellava_example.py
and andpixtral_example.py
to match the order of arguments on the modifierTesting
Use the
llmcompressor.trace
debug scriptStretch
It might be nice if this tracing debugger tool also printed the model graph to an svg