Enabled Infer CLI for VLM #287

asmigosw · 2025-02-27T06:23:25Z

Added support for enabling VLMs via CLI.

Sample command:

python -m QEfficient.cloud.infer --model_name meta-llama/Llama-3.2-11B-Vision-Instruct 
--batch_size 1 --prompt_len 32 --ctx_len 512 --num_cores 16 --device_group [0]
--prompt "Descrive the image?" --mos 1  --allocator_dealloc_delay 1 
--image_url https://i.etsystatic.com/8155076/r/il/0825c2/1594869823/il_fullxfull.1594869823_5x0w.jpg

QEfficient/base/common.py

QEfficient/transformers/models/modeling_auto.py

QEfficient/cloud/infer.py

QEfficient/base/common.py

QEfficient/transformers/models/modeling_auto.py

QEfficient/cloud/infer.py

quic-amitraj

LGTM

Removing onnx_defer_loading flag which was originally removed in _[Removed onnx_defer_loading from Immutable Convertor Args. PR: 230]_ but got added back later in _[Mllama(single + dual) + InternVL(single) + Llava (single) PR: 267]_ maybe becausing of rebasing. Signed-off-by: Shubham Agrawal <[email protected]> Signed-off-by: Asmita Goswami <[email protected]>

This will create a config JSON file, which contains all the details about compilation and SDK versions. Currently, this code is added in the code block of QEFFAutoModelForCausalLM.compile. The config would look like below: ``` { "huggingface_config": { "vocab_size": 50257, "n_positions": 1024, "n_embd": 768, "n_layer": 12, "n_head": 12, "n_inner": null, "activation_function": "gelu_new", "resid_pdrop": 0.1, "embd_pdrop": 0.1, "attn_pdrop": 0.1, "layer_norm_epsilon": 1e-05, "initializer_range": 0.02, "summary_type": "cls_index", "summary_use_proj": true, "summary_activation": null, "summary_first_dropout": 0.1, "summary_proj_to_labels": true, "scale_attn_weights": true, "use_cache": true, "scale_attn_by_inverse_layer_idx": false, "reorder_and_upcast_attn": false, "bos_token_id": 50256, "eos_token_id": 50256, "return_dict": true, "output_hidden_states": false, "output_attentions": false, "torchscript": false, "torch_dtype": null, "use_bfloat16": false, "tf_legacy_loss": false, "pruned_heads": {}, "tie_word_embeddings": true, "chunk_size_feed_forward": 0, "is_encoder_decoder": false, "is_decoder": false, "cross_attention_hidden_size": null, "add_cross_attention": false, "tie_encoder_decoder": false, "max_length": 20, "min_length": 0, "do_sample": false, "early_stopping": false, "num_beams": 1, "num_beam_groups": 1, "diversity_penalty": 0.0, "temperature": 1.0, "top_k": 50, "top_p": 1.0, "typical_p": 1.0, "repetition_penalty": 1.0, "length_penalty": 1.0, "no_repeat_ngram_size": 0, "encoder_no_repeat_ngram_size": 0, "bad_words_ids": null, "num_return_sequences": 1, "output_scores": false, "return_dict_in_generate": false, "forced_bos_token_id": null, "forced_eos_token_id": null, "remove_invalid_values": false, "exponential_decay_length_penalty": null, "suppress_tokens": null, "begin_suppress_tokens": null, "architectures": [ "GPT2LMHeadModel" ], "finetuning_task": null, "id2label": { "0": "LABEL_0", "1": "LABEL_1" }, "label2id": { "LABEL_0": 0, "LABEL_1": 1 }, "tokenizer_class": null, "prefix": null, "pad_token_id": null, "sep_token_id": null, "decoder_start_token_id": null, "task_specific_params": { "text-generation": { "do_sample": true, "max_length": 50 } }, "problem_type": null, "_name_or_path": "gpt2", "_commit_hash": "607a30d783dfa663caf39e06633721c8d4cfcd7e", "_attn_implementation_internal": "eager", "transformers_version": null, "model_type": "gpt2", "n_ctx": 1024 }, "qpc_config": { "QEff_config": { "pytorch_transforms": [ "AwqToMatmulNbitsTransform", "GPTQToMatmulNbitsTransform", "CustomOpsTransform", "KVCacheTransform" ], "onnx_transforms": [ "FP16ClipTransform", "SplitTensorsTransform" ], "onnx_path": "/root/.cache/qeff_models/GPT2LMHeadModel-36f0eca92731bb47/GPT2LMHeadModel.onnx" }, "aic_compiler_config": { "apps_sdk_version": "1.20.0", "compile_dir": "/root/.cache/qeff_models/GPT2LMHeadModel-36f0eca92731bb47", "specializtions_file_path": "/root/.cache/qeff_models/GPT2LMHeadModel-36f0eca92731bb47/specializations.json", "prefill_seq_len": 32, "ctx_len": 128, "batch_size": 1, "full_batch_size": null, "num_devices": 1, "num_cores": 16, "mxfp6_matmul": false, "mxint8_kv_cache": false, "num_speculative_tokens": null }, "qnn_config": { "enable_qnn": true, "qnn_config_path": "QEfficient/compile/qnn_config.json", "product": "QAIRT", "os": { "Ubuntu": 22.04, "Windows": 11 }, "sdk_flavor": [ "aic" ], "version": "2.31.0", "build_id": "250109072054_3882", "qnn_backend_api_version": "2.18.0", "tensorflow": "2.10.1", "tflite": "2.3.0", "torch": "1.13.1", "onnx": "1.16.1", "onnxruntime": "1.17.1", "onnxsimplifier": "0.4.36", "android-ndk": "r26c", "platform": "AIC.1.20.0.14" } } } ``` Note: The code structure may change. --------- Signed-off-by: Abukhoyer Shaik <[email protected]> Signed-off-by: Asmita Goswami <[email protected]>

… validation page (quic#303) Signed-off-by: Abukhoyer Shaik <[email protected]> Signed-off-by: Asmita Goswami <[email protected]>

This is just small fixes done for printing the `QEFFAutoModelForCausalLM`'s instance by changing the `__repr__(self)` method. Signed-off-by: Abukhoyer Shaik <[email protected]> Signed-off-by: Asmita Goswami <[email protected]>

Signed-off-by: Asmita Goswami <[email protected]>

QEfficient/cloud/infer.py

…o image_text_support

…ansformers into image_text_support

abukhoy

I think, we should add one small multimodal model under CLI api testing.

efficient-transformers/tests/cloud/high_level_testing.json

Line 3 in a706a01

"model_name" : ["gpt2"],

QEfficient/transformers/models/modeling_auto.py

Signed-off-by: Asmita Goswami <[email protected]>

asmigosw · 2025-04-17T09:19:40Z

I think, we should add one small multimodal model under CLI api testing.

efficient-transformers/tests/cloud/high_level_testing.json

Line 3 in a706a01

"model_name" : ["gpt2"],

Added test_infer_vlm.py for testing.

Signed-off-by: Asmita Goswami <[email protected]>

quic-hemagnih · 2025-04-17T12:50:41Z

TODO:
Refactoring Infer for compile and generate
Making CLI tests generic for all modelling classes by refactoring "conftest" and "json" file.

Signed-off-by: Asmita Goswami <[email protected]>

QEfficient/cloud/infer.py

Signed-off-by: Asmita Goswami <[email protected]>

QEfficient/cloud/infer.py

quic-amitraj · 2025-04-22T09:43:38Z

QEfficient/cloud/infer.py

+    if architecture in MODEL_FOR_IMAGE_TEXT_TO_TEXT_MAPPING_NAMES.values():
+        exec_info = execute_vlm_model(
+            qeff_model=qeff_model,
+            model_name=model_name,


TODO: Here use load_hf_processor and load_streamer to load processor and streamer. Create a list of conversation in _utill that will be mapped with the model architecture. At the end use qeff_model.generate as else condition. This way this code will be more scalable and well formatted.Then there will be no need of function execute_vlm_model.

Signed-off-by: Asmita Goswami <[email protected]>

quic-amitraj

LGTM

asmigosw requested review from quic-rishinr and ochougul as code owners February 27, 2025 06:23

quic-amitraj self-requested a review February 27, 2025 06:24

ochougul requested changes Feb 27, 2025

View reviewed changes

quic-amitraj requested changes Feb 27, 2025

View reviewed changes

QEfficient/transformers/models/modeling_auto.py Outdated Show resolved Hide resolved

QEfficient/cloud/infer.py Outdated Show resolved Hide resolved

QEfficient/cloud/infer.py Outdated Show resolved Hide resolved

QEfficient/cloud/infer.py Outdated Show resolved Hide resolved

quic-hemagnih reviewed Feb 27, 2025

View reviewed changes

QEfficient/base/common.py Outdated Show resolved Hide resolved

quic-amitraj reviewed Feb 28, 2025

View reviewed changes

QEfficient/base/common.py Outdated Show resolved Hide resolved

vbaddi assigned asmigosw Feb 28, 2025

quic-amitraj requested changes Mar 3, 2025

View reviewed changes

QEfficient/transformers/models/modeling_auto.py Outdated Show resolved Hide resolved

QEfficient/cloud/infer.py Outdated Show resolved Hide resolved

QEfficient/cloud/infer.py Outdated Show resolved Hide resolved

QEfficient/cloud/infer.py Outdated Show resolved Hide resolved

quic-amitraj approved these changes Mar 3, 2025

View reviewed changes

quic-rishinr requested a review from ochougul March 3, 2025 08:54

asmigosw force-pushed the image_text_support branch from 9567658 to ad06845 Compare March 3, 2025 11:50

asmigosw mentioned this pull request Mar 4, 2025

Enabled VLMs via CLI on v1.19.3 #297

Closed

asmigosw force-pushed the image_text_support branch from bc60d47 to 76e863a Compare March 6, 2025 06:02

asmigosw marked this pull request as draft March 6, 2025 06:08

asmigosw force-pushed the image_text_support branch from 76e863a to 8d99a93 Compare March 10, 2025 07:26

asmigosw marked this pull request as ready for review March 10, 2025 07:36

shubhagr-quic and others added 10 commits March 10, 2025 07:43

Docs string added for the Image class and granite models are added in…

687d44f

… validation page (quic#303) Signed-off-by: Abukhoyer Shaik <[email protected]> Signed-off-by: Asmita Goswami <[email protected]>

Enabled VLMs via CLI

691cca4

Signed-off-by: Asmita Goswami <[email protected]>

Addressing comments

ea8555d

Signed-off-by: Asmita Goswami <[email protected]>

Removed importlib

5ea6f1c

Signed-off-by: Asmita Goswami <[email protected]>

Addressing comments

561142b

Signed-off-by: Asmita Goswami <[email protected]>

Addressing comments

d9dc7d2

Signed-off-by: Asmita Goswami <[email protected]>

Resolved merge conflicts

1608804

Signed-off-by: Asmita Goswami <[email protected]>

asmigosw force-pushed the image_text_support branch from 3165896 to 1608804 Compare March 10, 2025 07:48

quic-rishinr requested a review from quic-amitraj March 12, 2025 05:19

quic-hemagnih reviewed Mar 12, 2025

View reviewed changes

QEfficient/cloud/infer.py Outdated Show resolved Hide resolved

asmigosw added 3 commits April 16, 2025 04:44

Merge branch 'main' of github.com:asmigosw/efficient-transformers int…

055cf5f

…o image_text_support

Merge branch 'main' into image_text_support

6dfd24a

Merge branch 'image_text_support' of github.com:asmigosw/efficient-tr…

542e8a4

…ansformers into image_text_support

abukhoy reviewed Apr 16, 2025

View reviewed changes

QEfficient/transformers/models/modeling_auto.py Outdated Show resolved Hide resolved

asmigosw added 2 commits April 17, 2025 08:56

Added VLM CLI test and addressed comments

893e322

Signed-off-by: Asmita Goswami <[email protected]>

Added Copyrights

9cde30b

Signed-off-by: Asmita Goswami <[email protected]>

asmigosw added 2 commits April 17, 2025 09:26

Added Copyrights

3a1e5b9

Signed-off-by: Asmita Goswami <[email protected]>

Merge branch 'main' into image_text_support

a6f25fd

quic-amitraj marked this pull request as draft April 17, 2025 10:38

quic-hemagnih approved these changes Apr 17, 2025

View reviewed changes

Resolved merge conflict in QEfficient/transformers/modeling_utils.py

3b5466e

quic-amitraj marked this pull request as ready for review April 18, 2025 07:09

Resolved merge conflict in QEfficient/base/common.py

7b8ab2f

asmigosw force-pushed the image_text_support branch from 0b41c60 to 7b8ab2f Compare April 21, 2025 09:13

Ruff format

d4af07a

Signed-off-by: Asmita Goswami <[email protected]>

quic-hemagnih approved these changes Apr 21, 2025

View reviewed changes

quic-rishinr requested changes Apr 22, 2025

View reviewed changes

QEfficient/cloud/infer.py Show resolved Hide resolved

quic-amitraj reviewed Apr 22, 2025

View reviewed changes

QEfficient/cloud/infer.py Outdated Show resolved Hide resolved

Addressed Comments

9441120

Signed-off-by: Asmita Goswami <[email protected]>

quic-amitraj requested changes Apr 22, 2025

View reviewed changes

QEfficient/cloud/infer.py Show resolved Hide resolved

QEfficient/cloud/infer.py Show resolved Hide resolved

quic-amitraj changed the title ~~Enabled VLMs via CLI~~ Enabled Infer CLI for VLM Apr 22, 2025

quic-amitraj reviewed Apr 22, 2025

View reviewed changes

asmigosw added 3 commits April 22, 2025 15:25

Merge branch 'main' into image_text_support

fa9e7bb

Updated load_hf_processor

e138828

Signed-off-by: Asmita Goswami <[email protected]>

Ruff check fix

28fd361

Signed-off-by: Asmita Goswami <[email protected]>

quic-amitraj approved these changes Apr 22, 2025

View reviewed changes

quic-rishinr approved these changes Apr 22, 2025

View reviewed changes

quic-amitraj merged commit a55e33b into quic:main Apr 23, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enabled Infer CLI for VLM #287

Enabled Infer CLI for VLM #287

asmigosw commented Feb 27, 2025 •

edited by vbaddi

Loading

quic-amitraj left a comment

abukhoy left a comment

asmigosw commented Apr 17, 2025

quic-hemagnih commented Apr 17, 2025

quic-amitraj Apr 22, 2025 •

edited

Loading

quic-amitraj left a comment

Enabled Infer CLI for VLM #287

Enabled Infer CLI for VLM #287

Conversation

asmigosw commented Feb 27, 2025 • edited by vbaddi Loading

quic-amitraj left a comment

Choose a reason for hiding this comment

abukhoy left a comment

Choose a reason for hiding this comment

asmigosw commented Apr 17, 2025

quic-hemagnih commented Apr 17, 2025

quic-amitraj Apr 22, 2025 • edited Loading

Choose a reason for hiding this comment

quic-amitraj left a comment

Choose a reason for hiding this comment

asmigosw commented Feb 27, 2025 •

edited by vbaddi

Loading

quic-amitraj Apr 22, 2025 •

edited

Loading