Skip to content

[BUG] error setting tokenizer with custom generation params for vllm #563

Closed
@rawsh

Description

@rawsh

Describe the bug

TypeError: expected str, bytes or os.PathLike object, not dict

With config from the readme

│ /root/anaconda3/envs/zero/lib/python3.10/site-packages/lighteval/models/model_loader.py:150 in   │                                                                                     
│ load_model_with_accelerate_or_default                                                            │                                                                                     
│                                                                                                  │                                                                                     
│   147 │   elif isinstance(config, VLLMModelConfig):                                              │                                                                                     
│   148 │   │   if not is_vllm_available():                                                        │                                                                                     
│   149 │   │   │   raise ImportError(NO_VLLM_ERROR_MSG)                                           │                                                                                     
│ _ 150 │   │   model = VLLMModel(config=config, env_config=env_config)                            │                                                                                     
│   151 │   │   return model                                                                       │                                                                                     
│   152 │   else:                                                                                  │                                                                                     
│   153 │   │   model = TransformersModel(config=config, env_config=env_config) 
│ /root/anaconda3/envs/zero/lib/python3.10/site-packages/lighteval/models/vllm/vllm_model.py:116   │
│ in __init__                                                                                      │
│                                                                                                  │
│   113 │   │   self.data_parallel_size = int(config.data_parallel_size)                           │
│   114 │   │                                                                                      │
│   115 │   │   self._add_special_tokens = config.add_special_tokens if config.add_special_token   │
│ _ 116 │   │   self._tokenizer = self._create_auto_tokenizer(config, env_config)                  │
│   117 │   │                                                                                      │
│   118 │   │   self._max_length = int(config.max_model_length) if config.max_model_length is no   │
│   119 
│ /root/anaconda3/envs/zero/lib/python3.10/site-packages/lighteval/models/vllm/vllm_model.py:202   │                                                                                     
│ in _create_auto_tokenizer                                                                        │                                                                                     
│                                                                                                  │                                                                                     
│   199 │   │   return model                                                                       │                                                                                     
│   200 │                                                                                          │                                                                                     
│   201 │   def _create_auto_tokenizer(self, config: VLLMModelConfig, env_config: EnvConfig):      │                                                                                     
│ _ 202 │   │   tokenizer = get_tokenizer(                                                         │                                                                                     
│   203 │   │   │   config.pretrained,                                                             │                                                                                     
│   204 │   │   │   tokenizer_mode="auto",                                                         │                                                                                     
│   205 │   │   │   trust_remote_code=config.trust_remote_code,
│ /root/anaconda3/envs/zero/lib/python3.10/site-packages/vllm/transformers_utils/tokenizer.py:120  │                                                                                     
│ in get_tokenizer                                                                                 │                                                                                     
│                                                                                                  │                                                                                     
│   117 │   │   kwargs["truncation_side"] = "left"                                                 │                                                                                     
│   118 │                                                                                          │                                                                                     
│   119 │   # Separate model folder from file path for GGUF models                                 │                                                                                     
│ _ 120 │   is_gguf = check_gguf_file(tokenizer_name)                                              │                                                                                     
│   121 │   if is_gguf:                                                                            │                                                                                     
│   122 │   │   kwargs["gguf_file"] = Path(tokenizer_name).name                                    │                                                                                     
│   123 │   │   tokenizer_name = Path(tokenizer_name).parent 
│ /root/anaconda3/envs/zero/lib/python3.10/site-packages/vllm/transformers_utils/utils.py:8 in     │                                                                                     
│ check_gguf_file                                                                                  │                                                                                     
│                                                                                                  │                                                                                     
│    5                                                                                             │                                                                                     
│    6 def check_gguf_file(model: Union[str, PathLike]) -> bool:                                   │                                                                                     
│    7 │   """Check if the file is a GGUF model."""                                                │                                                                                     
│ _  8 │   model = Path(model)                                                                     │                                                                                     
│    9 │   if not model.is_file():                                                                 │                                                                                     
│   10 │   │   return False                                                                        │                                                                                     
│   11 │   elif model.suffix == ".gguf":
TypeError: expected str, bytes or os.PathLike object, not dict

To Reproduce

model: # Model specific parameters
  base_params:
    model_args: "pretrained=Qwen/Qwen2.5-7B-Instruct,dtype=bfloat16,max_model_length=768,gpu_memory_utilisation=0.7" # Model args that you would pass in the command line
  generation: # Generation specific parameters
    temperature: 1.0
    stop_tokens: null
    truncate_prompt: false

Expected behavior

can set custom generation params properly

Version info

0.70

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions