Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LlavaNextForConditionalGeneration.forward() got an unexpected keyword argument 'token_idx' #1708

Open
DavidAbrahamyan opened this issue Jan 21, 2025 · 3 comments

Comments

@DavidAbrahamyan
Copy link

I am trying to do an inference using Llava Next
here is my code:

import habana_frameworks.torch as ht
import habana_frameworks.torch.core as htcore

from optimum.habana.transformers.modeling_utils import adapt_transformers_to_gaudi

from transformers import LlavaNextProcessor, AutoProcessor, AutoConfig, LlavaNextForConditionalGeneration
import torch
from PIL import Image
import requests
import os

from optimum.habana.transformers.models.llava_next import GaudiLlavaNextForConditionalGeneration
adapt_transformers_to_gaudi()
device = torch.device("hpu")
args_model_name_or_path = "/workspace/models/model_llava_v1_6_vicuna_7b"
model_type = AutoConfig.from_pretrained(args_model_name_or_path).model_type

print("Loading the processor")
processor = AutoProcessor.from_pretrained(args_model_name_or_path)

print("Loading the model")
model = LlavaNextForConditionalGeneration.from_pretrained(args_model_name_or_path,
    torch_dtype=torch.bfloat16,
    ) 
model.to("hpu")

print("hpu graph")

# prepare image and text prompt, using the appropriate prompt template
url = "https://github.com/haotian-liu/LLaVA/blob/1a91fc274d7c35a9b50b3cb29c4247ae5837ce39/images/llava_v1_5_radar.jpg?raw=true"
image = Image.open(requests.get(url, stream=True).raw)

conversation = [
    {

      "role": "user",
      "content": [
          {"type": "text", "text": "What is shown in this image?"},
          {"type": "image"},
        ],
    },
]
print("preparing prompt")
prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)

print("Prompt goes through processor")
model_dtype = torch.bfloat16
inputs = processor(images=image, text=prompt, return_tensors="pt").to("hpu", model_dtype)

# autoregressively complete prompt
print("generating output")
output = model.generate(**inputs, max_new_tokens=100)

print("printing the final result")
print(processor.decode(output[0], skip_special_tokens=True))

While running this, I get the following error:
TypeError: LlavaNextForConditionalGeneration.forward() got an unexpected keyword argument 'token_idx'

Any idea on what is causing this problem? Thanks in advance

@regisss
Copy link
Collaborator

regisss commented Jan 30, 2025

Hi @DavidAbrahamyan!

This happens because adapt_transformers_to_gaudi is called after LlavaNextForConditionalGeneration is imported from transformers.
This should actually be called before any import from transformers to make sure that all the changes are applied correctly.

And you can remove

from optimum.habana.transformers.models.llava_next import GaudiLlavaNextForConditionalGeneration

as adapt_transformers_to_gaudi will take care of it.

After that, your script returns a new error related to the number of image tokens. I guess you took inspiration from this example right? https://github.com/huggingface/optimum-habana/blob/main/examples/image-to-text/run_pipeline.py

@DavidAbrahamyan
Copy link
Author

Thanks a lot for your response! I tried doing as you said, however, now, I encounter the following issue:

Collecting transformers
  Downloading transformers-4.48.1-py3-none-any.whl.metadata (44 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 44.4/44.4 kB 2.1 MB/s eta 0:00:00
Collecting sentencepiece
  Downloading sentencepiece-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.7 kB)
Collecting accelerate
  Downloading accelerate-1.3.0-py3-none-any.whl.metadata (19 kB)
Collecting pillow
  Downloading pillow-11.1.0-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (9.1 kB)
Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (2.31.0)
Collecting optimum[habana]
  Downloading optimum-1.24.0-py3-none-any.whl.metadata (21 kB)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from transformers) (3.14.0)
Collecting huggingface-hub<1.0,>=0.24.0 (from transformers)
  Downloading huggingface_hub-0.28.1-py3-none-any.whl.metadata (13 kB)
Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from transformers) (1.23.5)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from transformers) (24.0)
Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from transformers) (6.0)
Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from transformers) (2023.5.5)
Collecting tokenizers<0.22,>=0.21 (from transformers)
  Downloading tokenizers-0.21.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Collecting safetensors>=0.4.1 (from transformers)
  Downloading safetensors-0.5.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.8 kB)
Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.10/dist-packages (from transformers) (4.66.4)
Collecting psutil (from accelerate)
  Downloading psutil-6.1.1-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (22 kB)
Requirement already satisfied: torch>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from accelerate) (2.2.0a0+git8964477)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests) (3.7)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests) (1.26.18)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests) (2024.2.2)
Collecting optimum-habana (from optimum[habana])
  Downloading optimum_habana-1.15.0-py3-none-any.whl.metadata (28 kB)
Collecting transformers
  Downloading transformers-4.45.2-py3-none-any.whl.metadata (44 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 44.4/44.4 kB 29.0 MB/s eta 0:00:00
Collecting tokenizers<0.21,>=0.20 (from transformers)
  Downloading tokenizers-0.20.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Requirement already satisfied: fsspec>=2023.5.0 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub<1.0,>=0.24.0->transformers) (2024.3.1)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub<1.0,>=0.24.0->transformers) (4.11.0)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch>=2.0.0->accelerate) (1.12)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch>=2.0.0->accelerate) (3.3)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch>=2.0.0->accelerate) (3.1.4)
Collecting accelerate
  Downloading accelerate-0.33.0-py3-none-any.whl.metadata (18 kB)
Collecting diffusers<0.32.0,>=0.31.0 (from optimum-habana->optimum[habana])
  Downloading diffusers-0.31.0-py3-none-any.whl.metadata (18 kB)
Collecting sentence-transformers==3.2.1 (from optimum-habana->optimum[habana])
  Downloading sentence_transformers-3.2.1-py3-none-any.whl.metadata (10 kB)
Collecting scikit-learn (from sentence-transformers==3.2.1->optimum-habana->optimum[habana])
  Downloading scikit_learn-1.6.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (18 kB)
Collecting scipy (from sentence-transformers==3.2.1->optimum-habana->optimum[habana])
  Downloading scipy-1.15.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62.0/62.0 kB 17.7 MB/s eta 0:00:00
Collecting importlib-metadata (from diffusers<0.32.0,>=0.31.0->optimum-habana->optimum[habana])
  Downloading importlib_metadata-8.6.1-py3-none-any.whl.metadata (4.7 kB)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch>=2.0.0->accelerate) (2.1.5)
Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch>=2.0.0->accelerate) (1.3.0)
Collecting zipp>=3.20 (from importlib-metadata->diffusers<0.32.0,>=0.31.0->optimum-habana->optimum[habana])
  Downloading zipp-3.21.0-py3-none-any.whl.metadata (3.7 kB)
Collecting joblib>=1.2.0 (from scikit-learn->sentence-transformers==3.2.1->optimum-habana->optimum[habana])
  Downloading joblib-1.4.2-py3-none-any.whl.metadata (5.4 kB)
Collecting threadpoolctl>=3.1.0 (from scikit-learn->sentence-transformers==3.2.1->optimum-habana->optimum[habana])
  Downloading threadpoolctl-3.5.0-py3-none-any.whl.metadata (13 kB)
Downloading sentencepiece-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 22.7 MB/s eta 0:00:00
Downloading pillow-11.1.0-cp310-cp310-manylinux_2_28_x86_64.whl (4.5 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.5/4.5 MB 61.9 MB/s eta 0:00:00
Downloading transformers-4.45.2-py3-none-any.whl (9.9 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9.9/9.9 MB 177.4 MB/s eta 0:00:00
Downloading huggingface_hub-0.28.1-py3-none-any.whl (464 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 464.1/464.1 kB 340.4 MB/s eta 0:00:00
Downloading safetensors-0.5.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (461 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 462.0/462.0 kB 363.3 MB/s eta 0:00:00
Downloading tokenizers-0.20.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.0 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.0/3.0 MB 305.5 MB/s eta 0:00:00
Downloading optimum-1.24.0-py3-none-any.whl (433 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 433.6/433.6 kB 346.1 MB/s eta 0:00:00
Downloading optimum_habana-1.15.0-py3-none-any.whl (809 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 809.9/809.9 kB 381.5 MB/s eta 0:00:00
Downloading sentence_transformers-3.2.1-py3-none-any.whl (255 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 255.8/255.8 kB 378.7 MB/s eta 0:00:00
Downloading accelerate-0.33.0-py3-none-any.whl (315 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 315.1/315.1 kB 355.8 MB/s eta 0:00:00
Downloading psutil-6.1.1-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (287 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 287.5/287.5 kB 359.6 MB/s eta 0:00:00
Downloading diffusers-0.31.0-py3-none-any.whl (2.9 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.9/2.9 MB 357.1 MB/s eta 0:00:00
Downloading importlib_metadata-8.6.1-py3-none-any.whl (26 kB)
Downloading scikit_learn-1.6.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.5 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 13.5/13.5 MB 290.8 MB/s eta 0:00:00
Downloading scipy-1.15.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (40.6 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 40.6/40.6 MB 377.6 MB/s eta 0:00:00
Downloading joblib-1.4.2-py3-none-any.whl (301 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 301.8/301.8 kB 389.8 MB/s eta 0:00:00
Downloading threadpoolctl-3.5.0-py3-none-any.whl (18 kB)
Downloading zipp-3.21.0-py3-none-any.whl (9.6 kB)
Installing collected packages: sentencepiece, zipp, threadpoolctl, scipy, safetensors, psutil, pillow, joblib, scikit-learn, importlib-metadata, huggingface-hub, tokenizers, diffusers, accelerate, transformers, sentence-transformers, optimum, optimum-habana
Successfully installed accelerate-0.33.0 diffusers-0.31.0 huggingface-hub-0.28.1 importlib-metadata-8.6.1 joblib-1.4.2 optimum-1.24.0 optimum-habana-1.15.0 pillow-11.1.0 psutil-6.1.1 safetensors-0.5.2 scikit-learn-1.6.1 scipy-1.15.1 sentence-transformers-3.2.1 sentencepiece-0.2.0 threadpoolctl-3.5.0 tokenizers-0.20.3 transformers-4.45.2 zipp-3.21.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
/usr/local/lib/python3.10/dist-packages/transformers/utils/generic.py:482: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
/usr/local/lib/python3.10/dist-packages/transformers/utils/generic.py:339: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
[WARNING|utils.py:212] 2025-01-30 17:38:26,194 >> optimum-habana v1.15.0 has been validated for SynapseAI v1.19.0 but habana-frameworks v1.15.1.15 was found, this could lead to undefined behavior!
[WARNING|utils.py:225] 2025-01-30 17:38:26,532 >> optimum-habana v1.15.0 has been validated for SynapseAI v1.19.0 but the driver version is v1.15.1, this could lead to undefined behavior!
/usr/local/lib/python3.10/dist-packages/transformers/utils/generic.py:339: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
/usr/local/lib/python3.10/dist-packages/transformers/utils/generic.py:339: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
/usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
Some kwargs in processor config are unused and will not have any effect: num_additional_image_tokens. 
Loading the processor
Loading the model
Loading checkpoint shards: 100%|██████████| 3/3 [00:39<00:00, 13.10s/it]
============================= HABANA PT BRIDGE CONFIGURATION =========================== 
 PT_HPU_LAZY_MODE = 1
 PT_RECIPE_CACHE_PATH = 
 PT_CACHE_FOLDER_DELETE = 0
 PT_HPU_RECIPE_CACHE_CONFIG = 
 PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
 PT_HPU_LAZY_ACC_PAR_MODE = 1
 PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0
---------------------------: System Configuration :---------------------------
Num CPU Cores : 96
CPU RAM       : 527939952 KB
------------------------------------------------------------------------------
hpu graph
Traceback (most recent call last):
  File "/workspace/tst4.py", line 52, in <module>
    output = model.generate(**inputs, max_new_tokens=100)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
preparing prompt
Prompt goes through processor
generating output
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/generation/utils.py", line 1468, in generate
    result = self._sample(
  File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/generation/utils.py", line 2440, in _sample
    model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/llava_next/modeling_llava_next.py", line 342, in prepare_inputs_for_generation
    self._merge_input_ids_with_image_features(
  File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/llava_next/modeling_llava_next.py", line 183, in _merge_input_ids_with_image_features
    final_attention_mask = torch.zeros(
RuntimeError: [Rank:0] FATAL ERROR :: MODULE:PT_DEVMEM Allocation failed for size::56836702208 (54203.7)MB

so it seems that there is a memory allocation issue. However, considering that I am able to do inference using pipeline class from transformers, I think it is strange to see such an issue.

Do you happen to know what is the reason for that? Thanks a lot in advance

@regisss
Copy link
Collaborator

regisss commented Feb 4, 2025

@DavidAbrahamyan Can you try to install the lib from the main branch with

pip install git+https://github.com/huggingface/optimum-habana.git

please?
It seems to work with the latest changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants