Skip to content

模型初始化失败 #2

@Adam-Peng

Description

@Adam-Peng

截图

Image Image

Logs

已重置
build: 5847 (de297624) with Apple clang version 17.0.0 (clang-1700.0.13.5) for arm64-apple-darwin24.5.0
llama_model_load_from_file_impl: using device Metal (Apple iOS simulator GPU) - 0 MiB free
llama_model_loader: loaded meta data with 32 key-value pairs and 291 tensors from /Users/bp/Library/Developer/CoreSimulator/Devices/7C4DE0D5-A190-41EE-9733-68B5A0C60FE9/data/Containers/Data/Application/9A394A6D-50CE-4725-9EAF-5E15726F307C/Documents/ggml-model-Q4_0.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Model
llama_model_loader: - kv   3:                         general.size_label str              = 3.6B
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                       llama.context_length u32              = 32768
llama_model_loader: - kv   6:                     llama.embedding_length u32              = 2560
llama_model_loader: - kv   7:                  llama.feed_forward_length u32              = 10240
llama_model_loader: - kv   8:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   9:              llama.attention.head_count_kv u32              = 2
llama_model_loader: - kv  10:                       llama.rope.freq_base f32              = 10000.000000
llama_model_loader: - kv  11:     llama.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  12:                 llama.attention.key_length u32              = 128
llama_model_loader: - kv  13:               llama.attention.value_length u32              = 128
llama_model_loader: - kv  14:                           llama.vocab_size u32              = 73448
llama_model_loader: - kv  15:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  16:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  17:                         tokenizer.ggml.pre str              = default
llama_model_loader: - kv  18:                      tokenizer.ggml.tokens arr[str,73448]   = ["<unk>", "<s>", "</s>", "<SEP>", "<C...
llama_model_loader: - kv  19:                      tokenizer.ggml.scores arr[f32,73448]   = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv  20:                  tokenizer.ggml.token_type arr[i32,73448]   = [3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  21:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  22:                tokenizer.ggml.eos_token_id u32              = 73440
llama_model_loader: - kv  23:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  24:            tokenizer.ggml.padding_token_id u32              = 2
llama_model_loader: - kv  25:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  26:               tokenizer.ggml.add_sep_token bool             = false
llama_model_loader: - kv  27:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  28:                    tokenizer.chat_template str              = {% for message in messages %}{{'<|im_...
llama_model_loader: - kv  29:            tokenizer.ggml.add_space_prefix bool             = false
llama_model_loader: - kv  30:               general.quantization_version u32              = 2
llama_model_loader: - kv  31:                          general.file_type u32              = 2
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q4_0:  225 tensors
llama_model_loader: - type q6_K:    1 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_0
print_info: file size   = 1.93 GiB (4.61 BPW) 
load: special tokens cache size = 92
load: token to piece cache size = 0.4342 MB
print_info: arch             = llama
print_info: vocab_only       = 0
print_info: n_ctx_train      = 32768
print_info: n_embd           = 2560
print_info: n_layer          = 32
print_info: n_head           = 32
print_info: n_head_kv        = 2
print_info: n_rot            = 128
print_info: n_swa            = 0
print_info: is_swa_any       = 0
print_info: n_embd_head_k    = 128
print_info: n_embd_head_v    = 128
print_info: n_gqa            = 16
print_info: n_embd_k_gqa     = 256
print_info: n_embd_v_gqa     = 256
print_info: f_norm_eps       = 0.0e+00
print_info: f_norm_rms_eps   = 1.0e-06
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: f_attn_scale     = 0.0e+00
print_info: n_ff             = 10240
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: causal attn      = 1
print_info: pooling type     = 0
print_info: rope type        = 0
print_info: rope scaling     = linear
print_info: freq_base_train  = 10000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 32768
print_info: rope_finetuned   = unknown
print_info: model type       = 8B
print_info: model params     = 3.61 B
print_info: general.name     = Model
print_info: vocab type       = SPM
print_info: n_vocab          = 73448
print_info: n_merges         = 0
print_info: BOS token        = 1 '<s>'
print_info: EOS token        = 73440 '<|im_end|>'
print_info: EOT token        = 73440 '<|im_end|>'
print_info: UNK token        = 0 '<unk>'
print_info: PAD token        = 2 '</s>'
print_info: LF token         = 1099 '<0x0A>'
print_info: FIM PRE token    = 73445 '<|fim_prefix|>'
print_info: FIM SUF token    = 73447 '<|fim_suffix|>'
print_info: FIM MID token    = 73446 '<|fim_middle|>'
print_info: EOG token        = 73440 '<|im_end|>'
print_info: max token length = 48
load_tensors: loading model tensors, this can take a while... (mmap = true)
llama_model_load: error loading model: vector
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model '/Users/bp/Library/Developer/CoreSimulator/Devices/7C4DE0D5-A190-41EE-9733-68B5A0C60FE9/data/Containers/Data/Application/9A394A6D-50CE-4725-9EAF-5E15726F307C/Documents/ggml-model-Q4_0.gguf'
初始化失败: initializationFailed("无法创建 MTMD 上下文")
38:29.429 -->> SettingsVC: V26模型设置为未选中状态
38:29.430 -->> SettingsVC: V4模型设置为选中状态
38:29.432 -->> SettingsVC: V26模型设置为未选中状态
38:29.432 -->> SettingsVC: V4模型设置为选中状态
模型已更新为: V4MultiModel
38:29.454 -->> Cell: 设置选中状态,statusString: 正在使用
38:29.456 -->> Cell: 显示自定义状态文字: 正在使用

Logs (iphone 14 pro)

52:16.889 -->> SettingsVC: V26模型设置为未选中状态
52:16.890 -->> SettingsVC: V4模型设置为选中状态
模型已更新为: V4MultiModel
52:16.917 -->> Cell: 设置选中状态,statusString: 正在使用
52:16.919 -->> Cell: 显示自定义状态文字: 正在使用
MTMDWrapper: 生成已停止
MTMDWrapper: 上下文已重置
已重置
build: 5847 (de297624) with Apple clang version 17.0.0 (clang-1700.0.13.5) for arm64-apple-darwin24.5.0
llama_model_load_from_file_impl: using device Metal (Apple A16 GPU) - 4095 MiB free
llama_model_loader: loaded meta data with 32 key-value pairs and 291 tensors from /var/mobile/Containers/Data/Application/0F5AB5D8-FAC8-488C-9088-5D7ED709D76E/Documents/ggml-model-Q4_0.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Model
llama_model_loader: - kv   3:                         general.size_label str              = 3.6B
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                       llama.context_length u32              = 32768
llama_model_loader: - kv   6:                     llama.embedding_length u32              = 2560
llama_model_loader: - kv   7:                  llama.feed_forward_length u32              = 10240
llama_model_loader: - kv   8:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   9:              llama.attention.head_count_kv u32              = 2
llama_model_loader: - kv  10:                       llama.rope.freq_base f32              = 10000.000000
llama_model_loader: - kv  11:     llama.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  12:                 llama.attention.key_length u32              = 128
llama_model_loader: - kv  13:               llama.attention.value_length u32              = 128
llama_model_loader: - kv  14:                           llama.vocab_size u32              = 73448
llama_model_loader: - kv  15:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  16:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  17:                         tokenizer.ggml.pre str              = default
llama_model_loader: - kv  18:                      tokenizer.ggml.tokens arr[str,73448]   = ["<unk>", "<s>", "</s>", "<SEP>", "<C...
llama_model_loader: - kv  19:                      tokenizer.ggml.scores arr[f32,73448]   = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv  20:                  tokenizer.ggml.token_type arr[i32,73448]   = [3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  21:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  22:                tokenizer.ggml.eos_token_id u32              = 73440
llama_model_loader: - kv  23:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  24:            tokenizer.ggml.padding_token_id u32              = 2
llama_model_loader: - kv  25:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  26:               tokenizer.ggml.add_sep_token bool             = false
llama_model_loader: - kv  27:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  28:                    tokenizer.chat_template str              = {% for message in messages %}{{'<|im_...
llama_model_loader: - kv  29:            tokenizer.ggml.add_space_prefix bool             = false
llama_model_loader: - kv  30:               general.quantization_version u32              = 2
llama_model_loader: - kv  31:                          general.file_type u32              = 2
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q4_0:  225 tensors
llama_model_loader: - type q6_K:    1 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_0
print_info: file size   = 1.93 GiB (4.61 BPW) 
load: special tokens cache size = 92
load: token to piece cache size = 0.4342 MB
print_info: arch             = llama
print_info: vocab_only       = 0
print_info: n_ctx_train      = 32768
print_info: n_embd           = 2560
print_info: n_layer          = 32
print_info: n_head           = 32
print_info: n_head_kv        = 2
print_info: n_rot            = 128
print_info: n_swa            = 0
print_info: is_swa_any       = 0
print_info: n_embd_head_k    = 128
print_info: n_embd_head_v    = 128
print_info: n_gqa            = 16
print_info: n_embd_k_gqa     = 256
print_info: n_embd_v_gqa     = 256
print_info: f_norm_eps       = 0.0e+00
print_info: f_norm_rms_eps   = 1.0e-06
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: f_attn_scale     = 0.0e+00
print_info: n_ff             = 10240
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: causal attn      = 1
print_info: pooling type     = 0
print_info: rope type        = 0
print_info: rope scaling     = linear
print_info: freq_base_train  = 10000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 32768
print_info: rope_finetuned   = unknown
print_info: model type       = 8B
print_info: model params     = 3.61 B
print_info: general.name     = Model
print_info: vocab type       = SPM
print_info: n_vocab          = 73448
print_info: n_merges         = 0
print_info: BOS token        = 1 '<s>'
print_info: EOS token        = 73440 '<|im_end|>'
print_info: EOT token        = 73440 '<|im_end|>'
print_info: UNK token        = 0 '<unk>'
print_info: PAD token        = 2 '</s>'
print_info: LF token         = 1099 '<0x0A>'
print_info: FIM PRE token    = 73445 '<|fim_prefix|>'
print_info: FIM SUF token    = 73447 '<|fim_suffix|>'
print_info: FIM MID token    = 73446 '<|fim_middle|>'
print_info: EOG token        = 73440 '<|im_end|>'
print_info: max token length = 48
load_tensors: loading model tensors, this can take a while... (mmap = true)
load_tensors: offloading 32 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 33/33 layers to GPU
load_tensors:   CPU_Mapped model buffer size =   100.87 MiB
load_tensors: Metal_Mapped model buffer size =  1981.10 MiB
..........................................................................................
llama_context: constructing llama_context
llama_context: n_seq_max     = 1
llama_context: n_ctx         = 4096
llama_context: n_ctx_per_seq = 4096
llama_context: n_batch       = 2048
llama_context: n_ubatch      = 512
llama_context: causal_attn   = 1
llama_context: flash_attn    = 0
llama_context: freq_base     = 10000.0
llama_context: freq_scale    = 1
llama_context: n_ctx_per_seq (4096) < n_ctx_train (32768) -- the full capacity of the model will not be utilized
ggml_metal_init: allocating
ggml_metal_init: picking default device: Apple A16 GPU
ggml_metal_load_library: using embedded metal library
fopen failed for data file: errno = 2 (No such file or directory)
Errors found! Invalidating cache...
Compiler failed with XPC_ERROR_CONNECTION_INTERRUPTED
Compiler failed with XPC_ERROR_CONNECTION_INTERRUPTED
Compiler failed with XPC_ERROR_CONNECTION_INTERRUPTED
MTLCompiler: Compilation failed with XPC_ERROR_CONNECTION_INTERRUPTED on 3 try
ggml_metal_load_library: error: Error Domain=MTLLibraryErrorDomain Code=3 "Compiler encountered an internal error" UserInfo={NSLocalizedDescription=Compiler encountered an internal error}
ggml_metal_init: error: metal library is nil
ggml_backend_metal_device_init: error: failed to allocate context
llama_init_from_model: failed to initialize the context: failed to initialize Metal backend
common_init_from_params: failed to create context with model '/var/mobile/Containers/Data/Application/0F5AB5D8-FAC8-488C-9088-5D7ED709D76E/Documents/ggml-model-Q4_0.gguf'
初始化失败: initializationFailed("无法创建 MTMD 上下文")

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions