You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
validation with SP is not supported yet #2
With long context models we usually validate it with vllm inference + LLM judge on validation sets. Still, we do have later plans to support validation with SP as per #2
异常信息:
{'loss': 0.7334, 'grad_norm': 1.2644673585891724, 'learning_rate': 9.999978915433865e-06, 'epoch': 0.02}
{'loss': 0.7381, 'grad_norm': 1.2470377683639526, 'learning_rate': 9.999869007504867e-06, 'epoch': 0.02}
{'loss': 0.7243, 'grad_norm': 1.0461479425430298, 'learning_rate': 9.999665163306944e-06, 'epoch': 0.03}
1%|█▊ | 1000/73212 [2:46:04<199:21:07, 9.94s/it]
[INFO|trainer.py:4021] 2025-02-24 10:27:40,778 >>
***** Running Evaluation *****
[INFO|trainer.py:4023] 2025-02-24 10:27:40,779 >> Num examples = 4000
[INFO|trainer.py:4026] 2025-02-24 10:27:40,779 >> Batch size = 1
[rank1]: Traceback (most recent call last):
[rank1]: File "/data5/360-LLaMA-Factory/src/llamafactory/launcher.py", line 23, in
[rank1]: launch()
[rank1]: File "/data5/360-LLaMA-Factory/src/llamafactory/launcher.py", line 19, in launch
[rank1]: run_exp()
[rank1]: File "/data5/360-LLaMA-Factory/src/llamafactory/train/tuner.py", line 50, in run_exp
[rank1]: run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
[rank1]: File "/data5/360-LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 102, in run_sft
[rank1]: train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/data0/miniconda3/envs/360-llama-factory/lib/python3.11/site-packages/transformers/trainer.py", line 2052, in train
[rank1]: return inner_training_loop(
[rank1]: ^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/data0/miniconda3/envs/360-llama-factory/lib/python3.11/site-packages/transformers/trainer.py", line 2467, in _inner_training_loop
[rank1]: self._maybe_log_save_evaluate(tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval)
[rank1]: File "/data0/miniconda3/envs/360-llama-factory/lib/python3.11/site-packages/transformers/trainer.py", line 2915, in _maybe_log_save_evaluate
[rank1]: metrics = self._evaluate(trial, ignore_keys_for_eval)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/data0/miniconda3/envs/360-llama-factory/lib/python3.11/site-packages/transformers/trainer.py", line 2872, in _evaluate
[rank1]: metrics = self.evaluate(ignore_keys=ignore_keys_for_eval)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/data0/miniconda3/envs/360-llama-factory/lib/python3.11/site-packages/transformers/trainer_seq2seq.py", line 180, in evaluate
[rank1]: return super().evaluate(eval_dataset, ignore_keys=ignore_keys, metric_key_prefix=metric_key_prefix)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/data0/miniconda3/envs/360-llama-factory/lib/python3.11/site-packages/transformers/trainer.py", line 3868, in evaluate
[rank1]: output = eval_loop(
[rank1]: ^^^^^^^^^^
[rank1]: File "/data0/miniconda3/envs/360-llama-factory/lib/python3.11/site-packages/transformers/trainer.py", line 4061, in evaluation_loop
[rank1]: losses, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/data5/360-LLaMA-Factory/src/llamafactory/train/sft/trainer.py", line 174, in prediction_step
[rank1]: loss, generated_tokens, _ = super().prediction_step( # ignore the returned labels (may be truncated)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/data0/miniconda3/envs/360-llama-factory/lib/python3.11/site-packages/transformers/trainer_seq2seq.py", line 278, in prediction_step
[rank1]: return super().prediction_step(
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/data0/miniconda3/envs/360-llama-factory/lib/python3.11/site-packages/transformers/trainer.py", line 4279, in prediction_step
[rank1]: loss, outputs = self.compute_loss(model, inputs, return_outputs=True)
[rank1]: ^^^^^^^^^^^^^
[rank1]: File "/data0/miniconda3/envs/360-llama-factory/lib/python3.11/site-packages/torch/_tensor.py", line 1109, in iter
[rank1]: raise TypeError("iteration over a 0-d tensor")
[rank1]: TypeError: iteration over a 0-d tensor
[rank3]: Traceback (most recent call last):
[rank3]: File "/data5/360-LLaMA-Factory/src/llamafactory/launcher.py", line 23, in
[rank3]: launch()
[rank3]: File "/data5/360-LLaMA-Factory/src/llamafactory/launcher.py", line 19, in launch
[rank3]: run_exp()
[rank3]: File "/data5/360-LLaMA-Factory/src/llamafactory/train/tuner.py", line 50, in run_exp
[rank3]: run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
[rank3]: File "/data5/360-LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 102, in run_sft
[rank3]: train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
[rank3]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The text was updated successfully, but these errors were encountered: