-
Notifications
You must be signed in to change notification settings - Fork 54
Open
Description
Issue
Model Config and tokenizer config mismatch
In HF model repo config.json - llm_config section:
https://huggingface.co/inclusionAI/Ming-flash-omni-2.0/blob/main/config.json#L96-L99
"image_patch_token": 157157,
"video_patch_token": 157175,
"image_start_token": 157158,
"video_start_token": 157159,The video_start_token is 157159,
However, in the tokenizer_config.json and tokenizer.json file, the id is pointing to
Lines 2149 to 2156 in 2a0c02a
| "157159": { | |
| "content": "</image>", | |
| "lstrip": false, | |
| "normalized": false, | |
| "rstrip": false, | |
| "single_word": false, | |
| "special": true | |
| }, |
which seems to be the image end token id.
Refer to the video start token id in tokenizer config file:
Lines 2157 to 2164 in 2a0c02a
| "157160": { | |
| "content": "<video>", | |
| "lstrip": false, | |
| "normalized": false, | |
| "rstrip": false, | |
| "single_word": false, | |
| "special": true | |
| }, |
Should we update the video_start_token to 157160 in HF repo config.json?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels