@@ -117,6 +117,8 @@ The table below introcudes all models supported by SWIFT:
117
117
| chinese-alpaca-2-7b-64k| [ AI-ModelScope/chinese-alpaca-2-7b-64k] ( https://modelscope.cn/models/AI-ModelScope/chinese-alpaca-2-7b-64k/summary ) | q_proj, k_proj, v_proj| llama| ✔ ; | ✔ ; || -| [ hfl/chinese-alpaca-2-7b-64k] ( https://huggingface.co/hfl/chinese-alpaca-2-7b-64k ) |
118
118
| chinese-alpaca-2-13b| [ AI-ModelScope/chinese-alpaca-2-13b] ( https://modelscope.cn/models/AI-ModelScope/chinese-alpaca-2-13b/summary ) | q_proj, k_proj, v_proj| llama| ✔ ; | ✔ ; || -| [ hfl/chinese-alpaca-2-13b] ( https://huggingface.co/hfl/chinese-alpaca-2-13b ) |
119
119
| chinese-alpaca-2-13b-16k| [ AI-ModelScope/chinese-alpaca-2-13b-16k] ( https://modelscope.cn/models/AI-ModelScope/chinese-alpaca-2-13b-16k/summary ) | q_proj, k_proj, v_proj| llama| ✔ ; | ✔ ; || -| [ hfl/chinese-alpaca-2-13b-16k] ( https://huggingface.co/hfl/chinese-alpaca-2-13b-16k ) |
120
+ | llama-3-chinese-8b| [ ChineseAlpacaGroup/llama-3-chinese-8b] ( https://modelscope.cn/models/ChineseAlpacaGroup/llama-3-chinese-8b/summary ) | q_proj, k_proj, v_proj| default-generation| ✔ ; | ✔ ; || -| [ hfl/llama-3-chinese-8b] ( https://huggingface.co/hfl/llama-3-chinese-8b ) |
121
+ | llama-3-chinese-8b-instruct| [ ChineseAlpacaGroup/llama-3-chinese-8b-instruct] ( https://modelscope.cn/models/ChineseAlpacaGroup/llama-3-chinese-8b-instruct/summary ) | q_proj, k_proj, v_proj| llama3| ✔ ; | ✔ ; || -| [ hfl/llama-3-chinese-8b-instruct] ( https://huggingface.co/hfl/llama-3-chinese-8b-instruct ) |
120
122
| atom-7b| [ FlagAlpha/Atom-7B] ( https://modelscope.cn/models/FlagAlpha/Atom-7B/summary ) | q_proj, k_proj, v_proj| default-generation| ✔ ; | ✔ ; || -| [ FlagAlpha/Atom-7B] ( https://huggingface.co/FlagAlpha/Atom-7B ) |
121
123
| atom-7b-chat| [ FlagAlpha/Atom-7B-Chat] ( https://modelscope.cn/models/FlagAlpha/Atom-7B-Chat/summary ) | q_proj, k_proj, v_proj| atom| ✔ ; | ✔ ; || -| [ FlagAlpha/Atom-7B-Chat] ( https://huggingface.co/FlagAlpha/Atom-7B-Chat ) |
122
124
| llava1d6-mistral-7b-instruct| [ AI-ModelScope/llava-v1.6-mistral-7b] ( https://modelscope.cn/models/AI-ModelScope/llava-v1.6-mistral-7b/summary ) | q_proj, k_proj, v_proj| llava-mistral-instruct| ✔ ; | ✘ ; | transformers>=4.34| multi-modal, vision| [ liuhaotian/llava-v1.6-mistral-7b] ( https://huggingface.co/liuhaotian/llava-v1.6-mistral-7b ) |
@@ -156,6 +158,7 @@ The table below introcudes all models supported by SWIFT:
156
158
| internlm2-math-20b| [ Shanghai_AI_Laboratory/internlm2-math-base-20b] ( https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-math-base-20b/summary ) | wqkv| default-generation| ✔ ; | ✔ ; | transformers>=4.35| math| [ internlm/internlm2-math-base-20b] ( https://huggingface.co/internlm/internlm2-math-base-20b ) |
157
159
| internlm2-math-20b-chat| [ Shanghai_AI_Laboratory/internlm2-math-20b] ( https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-math-20b/summary ) | wqkv| internlm2| ✔ ; | ✔ ; | transformers>=4.35| math| [ internlm/internlm2-math-20b] ( https://huggingface.co/internlm/internlm2-math-20b ) |
158
160
| internlm-xcomposer2-7b-chat| [ Shanghai_AI_Laboratory/internlm-xcomposer2-7b] ( https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-xcomposer2-7b/summary ) | wqkv| internlm-xcomposer2| ✔ ; | ✘ ; || multi-modal, vision| [ internlm/internlm-xcomposer2-7b] ( https://huggingface.co/internlm/internlm-xcomposer2-7b ) |
161
+ | internvl-chat-v1_5| [ AI-ModelScope/InternVL-Chat-V1-5] ( https://modelscope.cn/models/AI-ModelScope/InternVL-Chat-V1-5/summary ) | wqkv| internvl| ✔ ; | ✘ ; | transformers>=4.35| -| [ OpenGVLab/InternVL-Chat-V1-5] ( https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5 ) |
159
162
| deepseek-7b| [ deepseek-ai/deepseek-llm-7b-base] ( https://modelscope.cn/models/deepseek-ai/deepseek-llm-7b-base/summary ) | q_proj, k_proj, v_proj| default-generation| ✔ ; | ✔ ; || -| [ deepseek-ai/deepseek-llm-7b-base] ( https://huggingface.co/deepseek-ai/deepseek-llm-7b-base ) |
160
163
| deepseek-7b-chat| [ deepseek-ai/deepseek-llm-7b-chat] ( https://modelscope.cn/models/deepseek-ai/deepseek-llm-7b-chat/summary ) | q_proj, k_proj, v_proj| deepseek| ✔ ; | ✔ ; || -| [ deepseek-ai/deepseek-llm-7b-chat] ( https://huggingface.co/deepseek-ai/deepseek-llm-7b-chat ) |
161
164
| deepseek-moe-16b| [ deepseek-ai/deepseek-moe-16b-base] ( https://modelscope.cn/models/deepseek-ai/deepseek-moe-16b-base/summary ) | q_proj, k_proj, v_proj| default-generation| ✔ ; | ✔ ; || -| [ deepseek-ai/deepseek-moe-16b-base] ( https://huggingface.co/deepseek-ai/deepseek-moe-16b-base ) |
@@ -250,8 +253,8 @@ The table below introcudes all models supported by SWIFT:
250
253
| phi3-4b-4k-instruct| [ LLM-Research/Phi-3-mini-4k-instruct] ( https://modelscope.cn/models/LLM-Research/Phi-3-mini-4k-instruct/summary ) | qkv_proj| phi3| ✔ ; | ✘ ; | transformers>=4.36| general| [ microsoft/Phi-3-mini-4k-instruct] ( https://huggingface.co/microsoft/Phi-3-mini-4k-instruct ) |
251
254
| phi3-4b-128k-instruct| [ LLM-Research/Phi-3-mini-128k-instruct] ( https://modelscope.cn/models/LLM-Research/Phi-3-mini-128k-instruct/summary ) | qkv_proj| phi3| ✔ ; | ✘ ; | transformers>=4.36| general| [ microsoft/Phi-3-mini-128k-instruct] ( https://huggingface.co/microsoft/Phi-3-mini-128k-instruct ) |
252
255
| cogvlm-17b-instruct| [ ZhipuAI/cogvlm-chat] ( https://modelscope.cn/models/ZhipuAI/cogvlm-chat/summary ) | vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense| cogvlm-instruct| ✘ ; | ✘ ; || multi-modal, vision| [ THUDM/cogvlm-chat-hf] ( https://huggingface.co/THUDM/cogvlm-chat-hf ) |
253
- | cogagent-18b-chat| [ ZhipuAI/cogagent-chat] ( https://modelscope.cn/models/ZhipuAI/cogagent-chat/summary ) | vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense, query, key_value, dense| cogagent-chat| ✘ ; | ✘ ; || multi-modal, vision| [ THUDM/cogagent-chat-hf] ( https://huggingface.co/THUDM/cogagent-chat-hf ) |
254
- | cogagent-18b-instruct| [ ZhipuAI/cogagent-vqa] ( https://modelscope.cn/models/ZhipuAI/cogagent-vqa/summary ) | vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense, query, key_value, dense| cogagent-instruct| ✘ ; | ✘ ; || multi-modal, vision| [ THUDM/cogagent-vqa-hf] ( https://huggingface.co/THUDM/cogagent-vqa-hf ) |
256
+ | cogagent-18b-chat| [ ZhipuAI/cogagent-chat] ( https://modelscope.cn/models/ZhipuAI/cogagent-chat/summary ) | vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense, query, key_value, dense| cogagent-chat| ✘ ; | ✘ ; | timm | multi-modal, vision| [ THUDM/cogagent-chat-hf] ( https://huggingface.co/THUDM/cogagent-chat-hf ) |
257
+ | cogagent-18b-instruct| [ ZhipuAI/cogagent-vqa] ( https://modelscope.cn/models/ZhipuAI/cogagent-vqa/summary ) | vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense, query, key_value, dense| cogagent-instruct| ✘ ; | ✘ ; | timm | multi-modal, vision| [ THUDM/cogagent-vqa-hf] ( https://huggingface.co/THUDM/cogagent-vqa-hf ) |
255
258
| mamba-130m| [ AI-ModelScope/mamba-130m-hf] ( https://modelscope.cn/models/AI-ModelScope/mamba-130m-hf/summary ) | in_proj, x_proj, embeddings, out_proj| default-generation| ✘ ; | ✘ ; | transformers>=4.39.0| -| [ state-spaces/mamba-130m-hf] ( https://huggingface.co/state-spaces/mamba-130m-hf ) |
256
259
| mamba-370m| [ AI-ModelScope/mamba-370m-hf] ( https://modelscope.cn/models/AI-ModelScope/mamba-370m-hf/summary ) | in_proj, x_proj, embeddings, out_proj| default-generation| ✘ ; | ✘ ; | transformers>=4.39.0| -| [ state-spaces/mamba-370m-hf] ( https://huggingface.co/state-spaces/mamba-370m-hf ) |
257
260
| mamba-390m| [ AI-ModelScope/mamba-390m-hf] ( https://modelscope.cn/models/AI-ModelScope/mamba-390m-hf/summary ) | in_proj, x_proj, embeddings, out_proj| default-generation| ✘ ; | ✘ ; | transformers>=4.39.0| -| [ state-spaces/mamba-390m-hf] ( https://huggingface.co/state-spaces/mamba-390m-hf ) |
0 commit comments