Skip to content

[BUG] error setting tokenizer with custom generation params for vllm #563

@rawsh

Description

@rawsh

Describe the bug

TypeError: expected str, bytes or os.PathLike object, not dict

With config from the readme

│ /root/anaconda3/envs/zero/lib/python3.10/site-packages/lighteval/models/model_loader.py:150 in   │                                                                                     
│ load_model_with_accelerate_or_default                                                            │                                                                                     
│                                                                                                  │                                                                                     
│   147 │   elif isinstance(config, VLLMModelConfig):                                              │                                                                                     
│   148 │   │   if not is_vllm_available():                                                        │                                                                                     
│   149 │   │   │   raise ImportError(NO_VLLM_ERROR_MSG)                                           │                                                                                     
│ _ 150 │   │   model = VLLMModel(config=config, env_config=env_config)                            │                                                                                     
│   151 │   │   return model                                                                       │                                                                                     
│   152 │   else:                                                                                  │                                                                                     
│   153 │   │   model = TransformersModel(config=config, env_config=env_config) 
│ /root/anaconda3/envs/zero/lib/python3.10/site-packages/lighteval/models/vllm/vllm_model.py:116   │
│ in __init__                                                                                      │
│                                                                                                  │
│   113 │   │   self.data_parallel_size = int(config.data_parallel_size)                           │
│   114 │   │                                                                                      │
│   115 │   │   self._add_special_tokens = config.add_special_tokens if config.add_special_token   │
│ _ 116 │   │   self._tokenizer = self._create_auto_tokenizer(config, env_config)                  │
│   117 │   │                                                                                      │
│   118 │   │   self._max_length = int(config.max_model_length) if config.max_model_length is no   │
│   119 
│ /root/anaconda3/envs/zero/lib/python3.10/site-packages/lighteval/models/vllm/vllm_model.py:202   │                                                                                     
│ in _create_auto_tokenizer                                                                        │                                                                                     
│                                                                                                  │                                                                                     
│   199 │   │   return model                                                                       │                                                                                     
│   200 │                                                                                          │                                                                                     
│   201 │   def _create_auto_tokenizer(self, config: VLLMModelConfig, env_config: EnvConfig):      │                                                                                     
│ _ 202 │   │   tokenizer = get_tokenizer(                                                         │                                                                                     
│   203 │   │   │   config.pretrained,                                                             │                                                                                     
│   204 │   │   │   tokenizer_mode="auto",                                                         │                                                                                     
│   205 │   │   │   trust_remote_code=config.trust_remote_code,
│ /root/anaconda3/envs/zero/lib/python3.10/site-packages/vllm/transformers_utils/tokenizer.py:120  │                                                                                     
│ in get_tokenizer                                                                                 │                                                                                     
│                                                                                                  │                                                                                     
│   117 │   │   kwargs["truncation_side"] = "left"                                                 │                                                                                     
│   118 │                                                                                          │                                                                                     
│   119 │   # Separate model folder from file path for GGUF models                                 │                                                                                     
│ _ 120 │   is_gguf = check_gguf_file(tokenizer_name)                                              │                                                                                     
│   121 │   if is_gguf:                                                                            │                                                                                     
│   122 │   │   kwargs["gguf_file"] = Path(tokenizer_name).name                                    │                                                                                     
│   123 │   │   tokenizer_name = Path(tokenizer_name).parent 
│ /root/anaconda3/envs/zero/lib/python3.10/site-packages/vllm/transformers_utils/utils.py:8 in     │                                                                                     
│ check_gguf_file                                                                                  │                                                                                     
│                                                                                                  │                                                                                     
│    5                                                                                             │                                                                                     
│    6 def check_gguf_file(model: Union[str, PathLike]) -> bool:                                   │                                                                                     
│    7 │   """Check if the file is a GGUF model."""                                                │                                                                                     
│ _  8 │   model = Path(model)                                                                     │                                                                                     
│    9 │   if not model.is_file():                                                                 │                                                                                     
│   10 │   │   return False                                                                        │                                                                                     
│   11 │   elif model.suffix == ".gguf":
TypeError: expected str, bytes or os.PathLike object, not dict

To Reproduce

model: # Model specific parameters
  base_params:
    model_args: "pretrained=Qwen/Qwen2.5-7B-Instruct,dtype=bfloat16,max_model_length=768,gpu_memory_utilisation=0.7" # Model args that you would pass in the command line
  generation: # Generation specific parameters
    temperature: 1.0
    stop_tokens: null
    truncate_prompt: false

Expected behavior

can set custom generation params properly

Version info

0.70

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions