Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

模型微调异常 #362

Open
darkprices opened this issue Oct 10, 2024 · 2 comments
Open

模型微调异常 #362

darkprices opened this issue Oct 10, 2024 · 2 comments

Comments

@darkprices
Copy link

darkprices commented Oct 10, 2024

2024-10-10,14:19:42 | INFO | Rank 0 | train LMDB file contains 4000 images and 4000 pairs.
2024-10-10,14:19:42 | INFO | Rank 0 | val LMDB file contains 500 images and 500 pairs.
2024-10-10,14:19:42 | INFO | Rank 0 | Params:
2024-10-10,14:19:42 | INFO | Rank 0 | accum_freq: 1
2024-10-10,14:19:42 | INFO | Rank 0 | aggregate: True
2024-10-10,14:19:42 | INFO | Rank 0 | batch_size: 128
2024-10-10,14:19:42 | INFO | Rank 0 | bert_weight_path: None
2024-10-10,14:19:42 | INFO | Rank 0 | beta1: 0.9
2024-10-10,14:19:42 | INFO | Rank 0 | beta2: 0.98
2024-10-10,14:19:42 | INFO | Rank 0 | checkpoint_path: /workspace/code/experiments/demo_data_finetune_vit-b-16_roberta-base_bs128_8gpu/checkpoints
2024-10-10,14:19:42 | INFO | Rank 0 | clip_weight_path: None
2024-10-10,14:19:42 | INFO | Rank 0 | context_length: 52
2024-10-10,14:19:42 | INFO | Rank 0 | debug: False
2024-10-10,14:19:42 | INFO | Rank 0 | device: cuda:0
2024-10-10,14:19:42 | INFO | Rank 0 | distillation: False
2024-10-10,14:19:42 | INFO | Rank 0 | eps: 1e-06
2024-10-10,14:19:42 | INFO | Rank 0 | freeze_vision: False
2024-10-10,14:19:42 | INFO | Rank 0 | gather_with_grad: False
2024-10-10,14:19:42 | INFO | Rank 0 | grad_checkpointing: False
2024-10-10,14:19:42 | INFO | Rank 0 | kd_loss_weight: 0.5
2024-10-10,14:19:42 | INFO | Rank 0 | local_device_rank: 0
2024-10-10,14:19:42 | INFO | Rank 0 | log_interval: 1
2024-10-10,14:19:42 | INFO | Rank 0 | log_level: 20
2024-10-10,14:19:42 | INFO | Rank 0 | log_path: /workspace/code/experiments/demo_data_finetune_vit-b-16_roberta-base_bs128_8gpu/out_2024-10-10-06-19-39.log
2024-10-10,14:19:42 | INFO | Rank 0 | logs: /workspace/code/experiments/
2024-10-10,14:19:42 | INFO | Rank 0 | lr: 0.0005
2024-10-10,14:19:42 | INFO | Rank 0 | mask_ratio: 0
2024-10-10,14:19:42 | INFO | Rank 0 | max_epochs: 100
2024-10-10,14:19:42 | INFO | Rank 0 | max_steps: 3200
2024-10-10,14:19:42 | INFO | Rank 0 | name: demo_data_finetune_vit-b-16_roberta-base_bs128_8gpu
2024-10-10,14:19:42 | INFO | Rank 0 | num_workers: 4
2024-10-10,14:19:42 | INFO | Rank 0 | precision: amp
2024-10-10,14:19:42 | INFO | Rank 0 | rank: 0
2024-10-10,14:19:42 | INFO | Rank 0 | report_training_batch_acc: True
2024-10-10,14:19:42 | INFO | Rank 0 | reset_data_offset: True
2024-10-10,14:19:42 | INFO | Rank 0 | reset_optimizer: True
2024-10-10,14:19:42 | INFO | Rank 0 | resume: /code/xx/Chinese-CLIP/pretrained_weights/clip_cn_vit-b-16.pt
2024-10-10,14:19:42 | INFO | Rank 0 | save_epoch_frequency: 1
2024-10-10,14:19:42 | INFO | Rank 0 | save_step_frequency: 999999
2024-10-10,14:19:42 | INFO | Rank 0 | seed: 123
2024-10-10,14:19:42 | INFO | Rank 0 | skip_aggregate: False
2024-10-10,14:19:42 | INFO | Rank 0 | skip_scheduler: False
2024-10-10,14:19:42 | INFO | Rank 0 | teacher_model_name: None
2024-10-10,14:19:42 | INFO | Rank 0 | text_model: RoBERTa-wwm-ext-base-chinese
2024-10-10,14:19:42 | INFO | Rank 0 | train_data: /workspace/code/demo_data/lmdb/train
2024-10-10,14:19:42 | INFO | Rank 0 | use_augment: True
2024-10-10,14:19:42 | INFO | Rank 0 | use_bn_sync: False
2024-10-10,14:19:42 | INFO | Rank 0 | use_flash_attention: False
2024-10-10,14:19:42 | INFO | Rank 0 | val_data: /workspace/code/demo_data/lmdb/valid
2024-10-10,14:19:42 | INFO | Rank 0 | valid_batch_size: 128
2024-10-10,14:19:42 | INFO | Rank 0 | valid_epoch_interval: 1
2024-10-10,14:19:42 | INFO | Rank 0 | valid_num_workers: 1
2024-10-10,14:19:42 | INFO | Rank 0 | valid_step_interval: 150
2024-10-10,14:19:42 | INFO | Rank 0 | vision_model: ViT-B-16
2024-10-10,14:19:42 | INFO | Rank 0 | warmup: 100
2024-10-10,14:19:42 | INFO | Rank 0 | wd: 0.001
2024-10-10,14:19:42 | INFO | Rank 0 | world_size: 1
2024-10-10,14:19:42 | INFO | Rank 0 | Use GPU: 0 for training
2024-10-10,14:19:42 | INFO | Rank 0 | => begin to load checkpoint '/code/xx/Chinese-CLIP/pretrained_weights/clip_cn_vit-b-16.pt'
2024-10-10,14:20:15 | INFO | Rank 0 | => loaded checkpoint '/code/xx/Chinese-CLIP/pretrained_weights/clip_cn_vit-b-16.pt' (epoch 15 @ 0 steps)
2024-10-10,14:20:23 | INFO | Rank 0 | Global Steps: 1/3200 | Train Epoch: 1 [128/4096 (3%)] | Loss: 5.140202 | Image2Text Acc: 5.47 | Text2Image Acc: 3.91 | Data Time: 6.993s | Batch Time: 8.429s | LR: 0.000005 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:20:24 | INFO | Rank 0 | Global Steps: 2/3200 | Train Epoch: 1 [256/4096 (6%)] | Loss: 5.020943 | Image2Text Acc: 6.25 | Text2Image Acc: 5.47 | Data Time: 0.503s | Batch Time: 0.813s | LR: 0.000010 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:20:25 | INFO | Rank 0 | Global Steps: 3/3200 | Train Epoch: 1 [384/4096 (9%)] | Loss: 4.862580 | Image2Text Acc: 6.25 | Text2Image Acc: 3.91 | Data Time: 0.025s | Batch Time: 0.385s | LR: 0.000015 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:20:25 | INFO | Rank 0 | Global Steps: 4/3200 | Train Epoch: 1 [512/4096 (12%)] | Loss: 5.083204 | Image2Text Acc: 4.69 | Text2Image Acc: 3.12 | Data Time: 0.036s | Batch Time: 0.330s | LR: 0.000020 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:20:29 | INFO | Rank 0 | Global Steps: 5/3200 | Train Epoch: 1 [640/4096 (16%)] | Loss: 4.380543 | Image2Text Acc: 4.69 | Text2Image Acc: 9.38 | Data Time: 4.093s | Batch Time: 4.395s | LR: 0.000025 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:20:31 | INFO | Rank 0 | Global Steps: 6/3200 | Train Epoch: 1 [768/4096 (19%)] | Loss: 4.561520 | Image2Text Acc: 3.91 | Text2Image Acc: 6.25 | Data Time: 1.478s | Batch Time: 1.772s | LR: 0.000030 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:20:31 | INFO | Rank 0 | Global Steps: 7/3200 | Train Epoch: 1 [896/4096 (22%)] | Loss: 4.347610 | Image2Text Acc: 3.91 | Text2Image Acc: 7.81 | Data Time: 0.036s | Batch Time: 0.330s | LR: 0.000035 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:20:32 | INFO | Rank 0 | Global Steps: 8/3200 | Train Epoch: 1 [1024/4096 (25%)] | Loss: 4.256195 | Image2Text Acc: 7.03 | Text2Image Acc: 3.91 | Data Time: 0.040s | Batch Time: 0.334s | LR: 0.000040 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:20:37 | INFO | Rank 0 | Global Steps: 9/3200 | Train Epoch: 1 [1152/4096 (28%)] | Loss: 4.305431 | Image2Text Acc: 2.34 | Text2Image Acc: 2.34 | Data Time: 4.561s | Batch Time: 4.863s | LR: 0.000045 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:20:39 | INFO | Rank 0 | Global Steps: 10/3200 | Train Epoch: 1 [1280/4096 (31%)] | Loss: 4.286503 | Image2Text Acc: 3.91 | Text2Image Acc: 7.81 | Data Time: 1.666s | Batch Time: 1.964s | LR: 0.000050 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:20:39 | INFO | Rank 0 | Global Steps: 11/3200 | Train Epoch: 1 [1408/4096 (34%)] | Loss: 4.256180 | Image2Text Acc: 5.47 | Text2Image Acc: 3.12 | Data Time: 0.041s | Batch Time: 0.338s | LR: 0.000055 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:20:39 | INFO | Rank 0 | Global Steps: 12/3200 | Train Epoch: 1 [1536/4096 (38%)] | Loss: 4.268936 | Image2Text Acc: 5.47 | Text2Image Acc: 6.25 | Data Time: 0.036s | Batch Time: 0.330s | LR: 0.000060 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:20:44 | INFO | Rank 0 | Global Steps: 13/3200 | Train Epoch: 1 [1664/4096 (41%)] | Loss: 4.263233 | Image2Text Acc: 6.25 | Text2Image Acc: 6.25 | Data Time: 3.959s | Batch Time: 4.263s | LR: 0.000065 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:20:46 | INFO | Rank 0 | Global Steps: 14/3200 | Train Epoch: 1 [1792/4096 (44%)] | Loss: 4.249680 | Image2Text Acc: 7.03 | Text2Image Acc: 4.69 | Data Time: 2.528s | Batch Time: 2.829s | LR: 0.000070 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:20:47 | INFO | Rank 0 | Global Steps: 15/3200 | Train Epoch: 1 [1920/4096 (47%)] | Loss: 4.208305 | Image2Text Acc: 6.25 | Text2Image Acc: 5.47 | Data Time: 0.040s | Batch Time: 0.339s | LR: 0.000075 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:20:47 | INFO | Rank 0 | Global Steps: 16/3200 | Train Epoch: 1 [2048/4096 (50%)] | Loss: 4.351048 | Image2Text Acc: 5.47 | Text2Image Acc: 6.25 | Data Time: 0.038s | Batch Time: 0.333s | LR: 0.000080 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:20:50 | INFO | Rank 0 | Global Steps: 17/3200 | Train Epoch: 1 [2176/4096 (53%)] | Loss: 4.289299 | Image2Text Acc: 3.91 | Text2Image Acc: 4.69 | Data Time: 2.945s | Batch Time: 3.242s | LR: 0.000085 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:20:53 | INFO | Rank 0 | Global Steps: 18/3200 | Train Epoch: 1 [2304/4096 (56%)] | Loss: 4.244534 | Image2Text Acc: 3.91 | Text2Image Acc: 5.47 | Data Time: 2.589s | Batch Time: 2.889s | LR: 0.000090 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:20:54 | INFO | Rank 0 | Global Steps: 19/3200 | Train Epoch: 1 [2432/4096 (59%)] | Loss: 4.298996 | Image2Text Acc: 5.47 | Text2Image Acc: 4.69 | Data Time: 0.036s | Batch Time: 0.334s | LR: 0.000095 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:20:54 | INFO | Rank 0 | Global Steps: 20/3200 | Train Epoch: 1 [2560/4096 (62%)] | Loss: 4.175068 | Image2Text Acc: 7.03 | Text2Image Acc: 2.34 | Data Time: 0.038s | Batch Time: 0.332s | LR: 0.000100 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:20:57 | INFO | Rank 0 | Global Steps: 21/3200 | Train Epoch: 1 [2688/4096 (66%)] | Loss: 4.202049 | Image2Text Acc: 6.25 | Text2Image Acc: 5.47 | Data Time: 2.378s | Batch Time: 2.680s | LR: 0.000105 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:21:00 | INFO | Rank 0 | Global Steps: 22/3200 | Train Epoch: 1 [2816/4096 (69%)] | Loss: 4.255169 | Image2Text Acc: 5.47 | Text2Image Acc: 6.25 | Data Time: 3.118s | Batch Time: 3.419s | LR: 0.000110 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:21:00 | INFO | Rank 0 | Global Steps: 23/3200 | Train Epoch: 1 [2944/4096 (72%)] | Loss: 4.340736 | Image2Text Acc: 6.25 | Text2Image Acc: 6.25 | Data Time: 0.044s | Batch Time: 0.343s | LR: 0.000115 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:21:01 | INFO | Rank 0 | Global Steps: 24/3200 | Train Epoch: 1 [3072/4096 (75%)] | Loss: 4.433716 | Image2Text Acc: 1.56 | Text2Image Acc: 6.25 | Data Time: 0.041s | Batch Time: 0.340s | LR: 0.000120 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:21:03 | INFO | Rank 0 | Global Steps: 25/3200 | Train Epoch: 1 [3200/4096 (78%)] | Loss: 4.339813 | Image2Text Acc: 3.91 | Text2Image Acc: 6.25 | Data Time: 1.788s | Batch Time: 2.085s | LR: 0.000125 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:21:06 | INFO | Rank 0 | Global Steps: 26/3200 | Train Epoch: 1 [3328/4096 (81%)] | Loss: 4.351143 | Image2Text Acc: 2.34 | Text2Image Acc: 3.12 | Data Time: 2.790s | Batch Time: 3.092s | LR: 0.000130 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:21:06 | INFO | Rank 0 | Global Steps: 27/3200 | Train Epoch: 1 [3456/4096 (84%)] | Loss: 4.369926 | Image2Text Acc: 2.34 | Text2Image Acc: 4.69 | Data Time: 0.043s | Batch Time: 0.338s | LR: 0.000135 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:21:07 | INFO | Rank 0 | Global Steps: 28/3200 | Train Epoch: 1 [3584/4096 (88%)] | Loss: 4.199516 | Image2Text Acc: 3.12 | Text2Image Acc: 3.12 | Data Time: 0.037s | Batch Time: 0.335s | LR: 0.000140 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:21:10 | INFO | Rank 0 | Global Steps: 29/3200 | Train Epoch: 1 [3712/4096 (91%)] | Loss: 4.327763 | Image2Text Acc: 3.91 | Text2Image Acc: 5.47 | Data Time: 3.056s | Batch Time: 3.354s | LR: 0.000145 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:21:12 | INFO | Rank 0 | Global Steps: 30/3200 | Train Epoch: 1 [3840/4096 (94%)] | Loss: 4.432281 | Image2Text Acc: 2.34 | Text2Image Acc: 4.69 | Data Time: 1.928s | Batch Time: 2.226s | LR: 0.000150 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:21:12 | INFO | Rank 0 | Global Steps: 31/3200 | Train Epoch: 1 [3968/4096 (97%)] | Loss: 4.358601 | Image2Text Acc: 6.25 | Text2Image Acc: 5.47 | Data Time: 0.037s | Batch Time: 0.332s | LR: 0.000155 | logit_scale: 4.604 | Global Batch Size: 128
2024-10-10,14:21:13 | INFO | Rank 0 | Global Steps: 32/3200 | Train Epoch: 1 [4096/4096 (100%)] | Loss: 4.322407 | Image2Text Acc: 4.69 | Text2Image Acc: 5.47 | Data Time: 0.037s | Batch Time: 0.332s | LR: 0.000160 | logit_scale: 4.604 | Global Batch Size: 128
2024-10-10,14:21:13 | INFO | Rank 0 | Begin to eval on validation set (epoch 1 @ 32 steps)...
2024-10-10,14:21:34 | INFO | Rank 0 | Validation Result (epoch 1 @ 32 steps) | Valid Loss: 4.217743 | Image2Text Acc: 3.91 | Text2Image Acc: 5.86 | logit_scale: 4.604 | Valid Batch Size: 128
2024-10-10,14:21:34 | INFO | Rank 0 | train LMDB file contains 4000 images and 4000 pairs.
2024-10-10,14:21:34 | INFO | Rank 0 | val LMDB file contains 500 images and 500 pairs.
2024-10-10,14:21:51 | INFO | Rank 0 | Saved checkpoint /workspace/code/experiments/demo_data_finetune_vit-b-16_roberta-base_bs128_8gpu/checkpoints/epoch1.pt (epoch 1 @ 32 steps) (writing took 17.00214672088623 seconds)
2024-10-10,14:22:08 | INFO | Rank 0 | Saved checkpoint /workspace/code/experiments/demo_data_finetune_vit-b-16_roberta-base_bs128_8gpu/checkpoints/epoch_latest.pt (epoch 1 @ 32 steps) (writing took 16.851421356201172 seconds)
2024-10-10,14:22:16 | INFO | Rank 0 | Global Steps: 33/3200 | Train Epoch: 2 [128/4096 (3%)] | Loss: 4.297310 | Image2Text Acc: 3.91 | Text2Image Acc: 5.47 | Data Time: 6.647s | Batch Time: 6.951s | LR: 0.000165 | logit_scale: 4.604 | Global Batch Size: 128
2024-10-10,14:22:16 | INFO | Rank 0 | Global Steps: 34/3200 | Train Epoch: 2 [256/4096 (6%)] | Loss: 4.284641 | Image2Text Acc: 4.69 | Text2Image Acc: 1.56 | Data Time: 0.042s | Batch Time: 0.339s | LR: 0.000170 | logit_scale: 4.604 | Global Batch Size: 128
2024-10-10,14:22:17 | INFO | Rank 0 | Global Steps: 35/3200 | Train Epoch: 2 [384/4096 (9%)] | Loss: 4.414612 | Image2Text Acc: 3.12 | Text2Image Acc: 4.69 | Data Time: 0.604s | Batch Time: 0.902s | LR: 0.000175 | logit_scale: 4.604 | Global Batch Size: 128
2024-10-10,14:22:17 | INFO | Rank 0 | Global Steps: 36/3200 | Train Epoch: 2 [512/4096 (12%)] | Loss: 4.766368 | Image2Text Acc: 3.12 | Text2Image Acc: 2.34 | Data Time: 0.036s | Batch Time: 0.331s | LR: 0.000180 | logit_scale: 4.604 | Global Batch Size: 128
2024-10-10,14:22:22 | INFO | Rank 0 | Global Steps: 37/3200 | Train Epoch: 2 [640/4096 (16%)] | Loss: 4.304634 | Image2Text Acc: 3.12 | Text2Image Acc: 2.34 | Data Time: 4.513s | Batch Time: 4.818s | LR: 0.000185 | logit_scale: 4.604 | Global Batch Size: 128
2024-10-10,14:22:22 | INFO | Rank 0 | Global Steps: 38/3200 | Train Epoch: 2 [768/4096 (19%)] | Loss: 4.475136 | Image2Text Acc: 3.12 | Text2Image Acc: 3.91 | Data Time: 0.038s | Batch Time: 0.334s | LR: 0.000190 | logit_scale: 4.604 | Global Batch Size: 128
2024-10-10,14:22:24 | INFO | Rank 0 | Global Steps: 39/3200 | Train Epoch: 2 [896/4096 (22%)] | Loss: 4.387520 | Image2Text Acc: 3.91 | Text2Image Acc: 3.91 | Data Time: 1.873s | Batch Time: 2.171s | LR: 0.000195 | logit_scale: 4.604 | Global Batch Size: 128
2024-10-10,14:22:25 | INFO | Rank 0 | Global Steps: 40/3200 | Train Epoch: 2 [1024/4096 (25%)] | Loss: 4.462852 | Image2Text Acc: 3.12 | Text2Image Acc: 3.91 | Data Time: 0.040s | Batch Time: 0.335s | LR: 0.000200 | logit_scale: 4.604 | Global Batch Size: 128
2024-10-10,14:22:29 | INFO | Rank 0 | Global Steps: 41/3200 | Train Epoch: 2 [1152/4096 (28%)] | Loss: 4.601013 | Image2Text Acc: 2.34 | Text2Image Acc: 1.56 | Data Time: 3.525s | Batch Time: 3.828s | LR: 0.000205 | logit_scale: 4.604 | Global Batch Size: 128
2024-10-10,14:22:29 | INFO | Rank 0 | Global Steps: 42/3200 | Train Epoch: 2 [1280/4096 (31%)] | Loss: 4.392643 | Image2Text Acc: 3.12 | Text2Image Acc: 3.91 | Data Time: 0.040s | Batch Time: 0.338s | LR: 0.000210 | logit_scale: 4.604 | Global Batch Size: 128
2024-10-10,14:22:30 | INFO | Rank 0 | Global Steps: 43/3200 | Train Epoch: 2 [1408/4096 (34%)] | Loss: 4.625900 | Image2Text Acc: 0.78 | Text2Image Acc: 3.91 | Data Time: 1.126s | Batch Time: 1.420s | LR: 0.000215 | logit_scale: 4.604 | Global Batch Size: 128
2024-10-10,14:22:31 | INFO | Rank 0 | Global Steps: 44/3200 | Train Epoch: 2 [1536/4096 (38%)] | Loss: 4.499672 | Image2Text Acc: 0.00 | Text2Image Acc: 2.34 | Data Time: 0.036s | Batch Time: 0.331s | LR: 0.000220 | logit_scale: 4.604 | Global Batch Size: 128
2024-10-10,14:22:35 | INFO | Rank 0 | Global Steps: 45/3200 | Train Epoch: 2 [1664/4096 (41%)] | Loss: 4.556229 | Image2Text Acc: 3.12 | Text2Image Acc: 4.69 | Data Time: 3.978s | Batch Time: 4.282s | LR: 0.000225 | logit_scale: 4.604 | Global Batch Size: 128
2024-10-10,14:22:35 | INFO | Rank 0 | Global Steps: 46/3200 | Train Epoch: 2 [1792/4096 (44%)] | Loss: 4.524231 | Image2Text Acc: 3.91 | Text2Image Acc: 3.12 | Data Time: 0.035s | Batch Time: 0.335s | LR: 0.000230 | logit_scale: 4.604 | Global Batch Size: 128
2024-10-10,14:22:37 | INFO | Rank 0 | Global Steps: 47/3200 | Train Epoch: 2 [1920/4096 (47%)] | Loss: 4.618797 | Image2Text Acc: 0.78 | Text2Image Acc: 3.12 | Data Time: 1.077s | Batch Time: 1.372s | LR: 0.000235 | logit_scale: 4.604 | Global Batch Size: 128
2024-10-10,14:22:37 | INFO | Rank 0 | Global Steps: 48/3200 | Train Epoch: 2 [2048/4096 (50%)] | Loss: 4.657127 | Image2Text Acc: 1.56 | Text2Image Acc: 2.34 | Data Time: 0.036s | Batch Time: 0.335s | LR: 0.000240 | logit_scale: 4.604 | Global Batch Size: 128
2024-10-10,14:22:41 | INFO | Rank 0 | Global Steps: 49/3200 | Train Epoch: 2 [2176/4096 (53%)] | Loss: 4.595833 | Image2Text Acc: 2.34 | Text2Image Acc: 1.56 | Data Time: 3.846s | Batch Time: 4.150s | LR: 0.000245 | logit_scale: 4.604 | Global Batch Size: 128
2024-10-10,14:22:42 | INFO | Rank 0 | Global Steps: 50/3200 | Train Epoch: 2 [2304/4096 (56%)] | Loss: 4.590302 | Image2Text Acc: 2.34 | Text2Image Acc: 2.34 | Data Time: 0.038s | Batch Time: 0.337s | LR: 0.000250 | logit_scale: 4.603 | Global Batch Size: 128
2024-10-10,14:22:43 | INFO | Rank 0 | Global Steps: 51/3200 | Train Epoch: 2 [2432/4096 (59%)] | Loss: 4.545975 | Image2Text Acc: 3.91 | Text2Image Acc: 1.56 | Data Time: 1.587s | Batch Time: 1.887s | LR: 0.000255 | logit_scale: 4.603 | Global Batch Size: 128
2024-10-10,14:22:44 | INFO | Rank 0 | Global Steps: 52/3200 | Train Epoch: 2 [2560/4096 (62%)] | Loss: 4.431618 | Image2Text Acc: 3.12 | Text2Image Acc: 3.91 | Data Time: 0.040s | Batch Time: 0.340s | LR: 0.000260 | logit_scale: 4.603 | Global Batch Size: 128
2024-10-10,14:22:47 | INFO | Rank 0 | Global Steps: 53/3200 | Train Epoch: 2 [2688/4096 (66%)] | Loss: 4.410482 | Image2Text Acc: 3.91 | Text2Image Acc: 3.12 | Data Time: 3.403s | Batch Time: 3.703s | LR: 0.000265 | logit_scale: 4.603 | Global Batch Size: 128
2024-10-10,14:22:48 | INFO | Rank 0 | Global Steps: 54/3200 | Train Epoch: 2 [2816/4096 (69%)] | Loss: 4.380985 | Image2Text Acc: 2.34 | Text2Image Acc: 5.47 | Data Time: 0.042s | Batch Time: 0.340s | LR: 0.000270 | logit_scale: 4.603 | Global Batch Size: 128
2024-10-10,14:22:50 | INFO | Rank 0 | Global Steps: 55/3200 | Train Epoch: 2 [2944/4096 (72%)] | Loss: 4.415821 | Image2Text Acc: 3.91 | Text2Image Acc: 3.12 | Data Time: 1.674s | Batch Time: 1.969s | LR: 0.000275 | logit_scale: 4.603 | Global Batch Size: 128
2024-10-10,14:22:50 | INFO | Rank 0 | Global Steps: 56/3200 | Train Epoch: 2 [3072/4096 (75%)] | Loss: 4.612484 | Image2Text Acc: 2.34 | Text2Image Acc: 3.12 | Data Time: 0.038s | Batch Time: 0.334s | LR: 0.000280 | logit_scale: 4.603 | Global Batch Size: 128
2024-10-10,14:22:54 | INFO | Rank 0 | Global Steps: 57/3200 | Train Epoch: 2 [3200/4096 (78%)] | Loss: 5.101124 | Image2Text Acc: 0.78 | Text2Image Acc: 3.91 | Data Time: 3.150s | Batch Time: 3.456s | LR: 0.000285 | logit_scale: 4.603 | Global Batch Size: 128
2024-10-10,14:22:54 | INFO | Rank 0 | Global Steps: 58/3200 | Train Epoch: 2 [3328/4096 (81%)] | Loss: 5.060188 | Image2Text Acc: 1.56 | Text2Image Acc: 0.78 | Data Time: 0.039s | Batch Time: 0.335s | LR: 0.000290 | logit_scale: 4.603 | Global Batch Size: 128
2024-10-10,14:22:56 | INFO | Rank 0 | Global Steps: 59/3200 | Train Epoch: 2 [3456/4096 (84%)] | Loss: 4.785370 | Image2Text Acc: 1.56 | Text2Image Acc: 3.91 | Data Time: 1.727s | Batch Time: 2.025s | LR: 0.000295 | logit_scale: 4.603 | Global Batch Size: 128
2024-10-10,14:22:56 | INFO | Rank 0 | Global Steps: 60/3200 | Train Epoch: 2 [3584/4096 (88%)] | Loss: 4.811279 | Image2Text Acc: 1.56 | Text2Image Acc: 2.34 | Data Time: 0.036s | Batch Time: 0.337s | LR: 0.000300 | logit_scale: 4.603 | Global Batch Size: 128
2024-10-10,14:23:00 | INFO | Rank 0 | Global Steps: 61/3200 | Train Epoch: 2 [3712/4096 (91%)] | Loss: 4.825073 | Image2Text Acc: 1.56 | Text2Image Acc: 0.78 | Data Time: 3.007s | Batch Time: 3.313s | LR: 0.000305 | logit_scale: 4.602 | Global Batch Size: 128
2024-10-10,14:23:00 | INFO | Rank 0 | Global Steps: 62/3200 | Train Epoch: 2 [3840/4096 (94%)] | Loss: 4.803360 | Image2Text Acc: 0.00 | Text2Image Acc: 0.78 | Data Time: 0.039s | Batch Time: 0.334s | LR: 0.000310 | logit_scale: 4.602 | Global Batch Size: 128
2024-10-10,14:23:01 | INFO | Rank 0 | Global Steps: 63/3200 | Train Epoch: 2 [3968/4096 (97%)] | Loss: 4.794815 | Image2Text Acc: 2.34 | Text2Image Acc: 1.56 | Data Time: 1.087s | Batch Time: 1.382s | LR: 0.000315 | logit_scale: 4.602 | Global Batch Size: 128
2024-10-10,14:23:02 | INFO | Rank 0 | Global Steps: 64/3200 | Train Epoch: 2 [4096/4096 (100%)] | Loss: 4.777557 | Image2Text Acc: 0.00 | Text2Image Acc: 2.34 | Data Time: 0.036s | Batch Time: 0.332s | LR: 0.000320 | logit_scale: 4.602 | Global Batch Size: 128
2024-10-10,14:23:02 | INFO | Rank 0 | Begin to eval on validation set (epoch 2 @ 64 steps)...
2024-10-10,14:23:18 | INFO | Rank 0 | Validation Result (epoch 2 @ 64 steps) | Valid Loss: 4.740685 | Image2Text Acc: 1.37 | Text2Image Acc: 1.37 | logit_scale: 4.602 | Valid Batch Size: 128
2024-10-10,14:23:19 | INFO | Rank 0 | train LMDB file contains 4000 images and 4000 pairs.
2024-10-10,14:23:19 | INFO | Rank 0 | val LMDB file contains 500 images and 500 pairs.

@Tiandishihua
Copy link

2024-10-10,14:19:42 | INFO | Rank 0 | train LMDB file contains 4000 images and 4000 pairs. 2024-10-10,14:19:42 | INFO | Rank 0 | val LMDB file contains 500 images and 500 pairs. 2024-10-10,14:19:42 | INFO | Rank 0 | Params: 2024-10-10,14:19:42 | INFO | Rank 0 | accum_freq: 1 2024-10-10,14:19:42 | INFO | Rank 0 | aggregate: True 2024-10-10,14:19:42 | INFO | Rank 0 | batch_size: 128 2024-10-10,14:19:42 | INFO | Rank 0 | bert_weight_path: None 2024-10-10,14:19:42 | INFO | Rank 0 | beta1: 0.9 2024-10-10,14:19:42 | INFO | Rank 0 | beta2: 0.98 2024-10-10,14:19:42 | INFO | Rank 0 | checkpoint_path: /workspace/code/experiments/demo_data_finetune_vit-b-16_roberta-base_bs128_8gpu/checkpoints 2024-10-10,14:19:42 | INFO | Rank 0 | clip_weight_path: None 2024-10-10,14:19:42 | INFO | Rank 0 | context_length: 52 2024-10-10,14:19:42 | INFO | Rank 0 | debug: False 2024-10-10,14:19:42 | INFO | Rank 0 | device: cuda:0 2024-10-10,14:19:42 | INFO | Rank 0 | distillation: False 2024-10-10,14:19:42 | INFO | Rank 0 | eps: 1e-06 2024-10-10,14:19:42 | INFO | Rank 0 | freeze_vision: False 2024-10-10,14:19:42 | INFO | Rank 0 | gather_with_grad: False 2024-10-10,14:19:42 | INFO | Rank 0 | grad_checkpointing: False 2024-10-10,14:19:42 | INFO | Rank 0 | kd_loss_weight: 0.5 2024-10-10,14:19:42 | INFO | Rank 0 | local_device_rank: 0 2024-10-10,14:19:42 | INFO | Rank 0 | log_interval: 1 2024-10-10,14:19:42 | INFO | Rank 0 | log_level: 20 2024-10-10,14:19:42 | INFO | Rank 0 | log_path: /workspace/code/experiments/demo_data_finetune_vit-b-16_roberta-base_bs128_8gpu/out_2024-10-10-06-19-39.log 2024-10-10,14:19:42 | INFO | Rank 0 | logs: /workspace/code/experiments/ 2024-10-10,14:19:42 | INFO | Rank 0 | lr: 0.0005 2024-10-10,14:19:42 | INFO | Rank 0 | mask_ratio: 0 2024-10-10,14:19:42 | INFO | Rank 0 | max_epochs: 100 2024-10-10,14:19:42 | INFO | Rank 0 | max_steps: 3200 2024-10-10,14:19:42 | INFO | Rank 0 | name: demo_data_finetune_vit-b-16_roberta-base_bs128_8gpu 2024-10-10,14:19:42 | INFO | Rank 0 | num_workers: 4 2024-10-10,14:19:42 | INFO | Rank 0 | precision: amp 2024-10-10,14:19:42 | INFO | Rank 0 | rank: 0 2024-10-10,14:19:42 | INFO | Rank 0 | report_training_batch_acc: True 2024-10-10,14:19:42 | INFO | Rank 0 | reset_data_offset: True 2024-10-10,14:19:42 | INFO | Rank 0 | reset_optimizer: True 2024-10-10,14:19:42 | INFO | Rank 0 | resume: /code/xx/Chinese-CLIP/pretrained_weights/clip_cn_vit-b-16.pt 2024-10-10,14:19:42 | INFO | Rank 0 | save_epoch_frequency: 1 2024-10-10,14:19:42 | INFO | Rank 0 | save_step_frequency: 999999 2024-10-10,14:19:42 | INFO | Rank 0 | seed: 123 2024-10-10,14:19:42 | INFO | Rank 0 | skip_aggregate: False 2024-10-10,14:19:42 | INFO | Rank 0 | skip_scheduler: False 2024-10-10,14:19:42 | INFO | Rank 0 | teacher_model_name: None 2024-10-10,14:19:42 | INFO | Rank 0 | text_model: RoBERTa-wwm-ext-base-chinese 2024-10-10,14:19:42 | INFO | Rank 0 | train_data: /workspace/code/demo_data/lmdb/train 2024-10-10,14:19:42 | INFO | Rank 0 | use_augment: True 2024-10-10,14:19:42 | INFO | Rank 0 | use_bn_sync: False 2024-10-10,14:19:42 | INFO | Rank 0 | use_flash_attention: False 2024-10-10,14:19:42 | INFO | Rank 0 | val_data: /workspace/code/demo_data/lmdb/valid 2024-10-10,14:19:42 | INFO | Rank 0 | valid_batch_size: 128 2024-10-10,14:19:42 | INFO | Rank 0 | valid_epoch_interval: 1 2024-10-10,14:19:42 | INFO | Rank 0 | valid_num_workers: 1 2024-10-10,14:19:42 | INFO | Rank 0 | valid_step_interval: 150 2024-10-10,14:19:42 | INFO | Rank 0 | vision_model: ViT-B-16 2024-10-10,14:19:42 | INFO | Rank 0 | warmup: 100 2024-10-10,14:19:42 | INFO | Rank 0 | wd: 0.001 2024-10-10,14:19:42 | INFO | Rank 0 | world_size: 1 2024-10-10,14:19:42 | INFO | Rank 0 | Use GPU: 0 for training 2024-10-10,14:19:42 | INFO | Rank 0 | => begin to load checkpoint '/code/xx/Chinese-CLIP/pretrained_weights/clip_cn_vit-b-16.pt' 2024-10-10,14:20:15 | INFO | Rank 0 | => loaded checkpoint '/code/xx/Chinese-CLIP/pretrained_weights/clip_cn_vit-b-16.pt' (epoch 15 @ 0 steps) 2024-10-10,14:20:23 | INFO | Rank 0 | Global Steps: 1/3200 | Train Epoch: 1 [128/4096 (3%)] | Loss: 5.140202 | Image2Text Acc: 5.47 | Text2Image Acc: 3.91 | Data Time: 6.993s | Batch Time: 8.429s | LR: 0.000005 | logit_scale: 4.605 | Global Batch Size: 128 2024-10-10,14:20:24 | INFO | Rank 0 | Global Steps: 2/3200 | Train Epoch: 1 [256/4096 (6%)] | Loss: 5.020943 | Image2Text Acc: 6.25 | Text2Image Acc: 5.47 | Data Time: 0.503s | Batch Time: 0.813s | LR: 0.000010 | logit_scale: 4.605 | Global Batch Size: 128 2024-10-10,14:20:25 | INFO | Rank 0 | Global Steps: 3/3200 | Train Epoch: 1 [384/4096 (9%)] | Loss: 4.862580 | Image2Text Acc: 6.25 | Text2Image Acc: 3.91 | Data Time: 0.025s | Batch Time: 0.385s | LR: 0.000015 | logit_scale: 4.605 | Global Batch Size: 128 2024-10-10,14:20:25 | INFO | Rank 0 | Global Steps: 4/3200 | Train Epoch: 1 [512/4096 (12%)] | Loss: 5.083204 | Image2Text Acc: 4.69 | Text2Image Acc: 3.12 | Data Time: 0.036s | Batch Time: 0.330s | LR: 0.000020 | logit_scale: 4.605 | Global Batch Size: 128 2024-10-10,14:20:29 | INFO | Rank 0 | Global Steps: 5/3200 | Train Epoch: 1 [640/4096 (16%)] | Loss: 4.380543 | Image2Text Acc: 4.69 | Text2Image Acc: 9.38 | Data Time: 4.093s | Batch Time: 4.395s | LR: 0.000025 | logit_scale: 4.605 | Global Batch Size: 128 2024-10-10,14:20:31 | INFO | Rank 0 | Global Steps: 6/3200 | Train Epoch: 1 [768/4096 (19%)] | Loss: 4.561520 | Image2Text Acc: 3.91 | Text2Image Acc: 6.25 | Data Time: 1.478s | Batch Time: 1.772s | LR: 0.000030 | logit_scale: 4.605 | Global Batch Size: 128 2024-10-10,14:20:31 | INFO | Rank 0 | Global Steps: 7/3200 | Train Epoch: 1 [896/4096 (22%)] | Loss: 4.347610 | Image2Text Acc: 3.91 | Text2Image Acc: 7.81 | Data Time: 0.036s | Batch Time: 0.330s | LR: 0.000035 | logit_scale: 4.605 | Global Batch Size: 128 2024-10-10,14:20:32 | INFO | Rank 0 | Global Steps: 8/3200 | Train Epoch: 1 [1024/4096 (25%)] | Loss: 4.256195 | Image2Text Acc: 7.03 | Text2Image Acc: 3.91 | Data Time: 0.040s | Batch Time: 0.334s | LR: 0.000040 | logit_scale: 4.605 | Global Batch Size: 128 2024-10-10,14:20:37 | INFO | Rank 0 | Global Steps: 9/3200 | Train Epoch: 1 [1152/4096 (28%)] | Loss: 4.305431 | Image2Text Acc: 2.34 | Text2Image Acc: 2.34 | Data Time: 4.561s | Batch Time: 4.863s | LR: 0.000045 | logit_scale: 4.605 | Global Batch Size: 128 2024-10-10,14:20:39 | INFO | Rank 0 | Global Steps: 10/3200 | Train Epoch: 1 [1280/4096 (31%)] | Loss: 4.286503 | Image2Text Acc: 3.91 | Text2Image Acc: 7.81 | Data Time: 1.666s | Batch Time: 1.964s | LR: 0.000050 | logit_scale: 4.605 | Global Batch Size: 128 2024-10-10,14:20:39 | INFO | Rank 0 | Global Steps: 11/3200 | Train Epoch: 1 [1408/4096 (34%)] | Loss: 4.256180 | Image2Text Acc: 5.47 | Text2Image Acc: 3.12 | Data Time: 0.041s | Batch Time: 0.338s | LR: 0.000055 | logit_scale: 4.605 | Global Batch Size: 128 2024-10-10,14:20:39 | INFO | Rank 0 | Global Steps: 12/3200 | Train Epoch: 1 [1536/4096 (38%)] | Loss: 4.268936 | Image2Text Acc: 5.47 | Text2Image Acc: 6.25 | Data Time: 0.036s | Batch Time: 0.330s | LR: 0.000060 | logit_scale: 4.605 | Global Batch Size: 128 2024-10-10,14:20:44 | INFO | Rank 0 | Global Steps: 13/3200 | Train Epoch: 1 [1664/4096 (41%)] | Loss: 4.263233 | Image2Text Acc: 6.25 | Text2Image Acc: 6.25 | Data Time: 3.959s | Batch Time: 4.263s | LR: 0.000065 | logit_scale: 4.605 | Global Batch Size: 128 2024-10-10,14:20:46 | INFO | Rank 0 | Global Steps: 14/3200 | Train Epoch: 1 [1792/4096 (44%)] | Loss: 4.249680 | Image2Text Acc: 7.03 | Text2Image Acc: 4.69 | Data Time: 2.528s | Batch Time: 2.829s | LR: 0.000070 | logit_scale: 4.605 | Global Batch Size: 128 2024-10-10,14:20:47 | INFO | Rank 0 | Global Steps: 15/3200 | Train Epoch: 1 [1920/4096 (47%)] | Loss: 4.208305 | Image2Text Acc: 6.25 | Text2Image Acc: 5.47 | Data Time: 0.040s | Batch Time: 0.339s | LR: 0.000075 | logit_scale: 4.605 | Global Batch Size: 128 2024-10-10,14:20:47 | INFO | Rank 0 | Global Steps: 16/3200 | Train Epoch: 1 [2048/4096 (50%)] | Loss: 4.351048 | Image2Text Acc: 5.47 | Text2Image Acc: 6.25 | Data Time: 0.038s | Batch Time: 0.333s | LR: 0.000080 | logit_scale: 4.605 | Global Batch Size: 128 2024-10-10,14:20:50 | INFO | Rank 0 | Global Steps: 17/3200 | Train Epoch: 1 [2176/4096 (53%)] | Loss: 4.289299 | Image2Text Acc: 3.91 | Text2Image Acc: 4.69 | Data Time: 2.945s | Batch Time: 3.242s | LR: 0.000085 | logit_scale: 4.605 | Global Batch Size: 128 2024-10-10,14:20:53 | INFO | Rank 0 | Global Steps: 18/3200 | Train Epoch: 1 [2304/4096 (56%)] | Loss: 4.244534 | Image2Text Acc: 3.91 | Text2Image Acc: 5.47 | Data Time: 2.589s | Batch Time: 2.889s | LR: 0.000090 | logit_scale: 4.605 | Global Batch Size: 128 2024-10-10,14:20:54 | INFO | Rank 0 | Global Steps: 19/3200 | Train Epoch: 1 [2432/4096 (59%)] | Loss: 4.298996 | Image2Text Acc: 5.47 | Text2Image Acc: 4.69 | Data Time: 0.036s | Batch Time: 0.334s | LR: 0.000095 | logit_scale: 4.605 | Global Batch Size: 128 2024-10-10,14:20:54 | INFO | Rank 0 | Global Steps: 20/3200 | Train Epoch: 1 [2560/4096 (62%)] | Loss: 4.175068 | Image2Text Acc: 7.03 | Text2Image Acc: 2.34 | Data Time: 0.038s | Batch Time: 0.332s | LR: 0.000100 | logit_scale: 4.605 | Global Batch Size: 128 2024-10-10,14:20:57 | INFO | Rank 0 | Global Steps: 21/3200 | Train Epoch: 1 [2688/4096 (66%)] | Loss: 4.202049 | Image2Text Acc: 6.25 | Text2Image Acc: 5.47 | Data Time: 2.378s | Batch Time: 2.680s | LR: 0.000105 | logit_scale: 4.605 | Global Batch Size: 128 2024-10-10,14:21:00 | INFO | Rank 0 | Global Steps: 22/3200 | Train Epoch: 1 [2816/4096 (69%)] | Loss: 4.255169 | Image2Text Acc: 5.47 | Text2Image Acc: 6.25 | Data Time: 3.118s | Batch Time: 3.419s | LR: 0.000110 | logit_scale: 4.605 | Global Batch Size: 128 2024-10-10,14:21:00 | INFO | Rank 0 | Global Steps: 23/3200 | Train Epoch: 1 [2944/4096 (72%)] | Loss: 4.340736 | Image2Text Acc: 6.25 | Text2Image Acc: 6.25 | Data Time: 0.044s | Batch Time: 0.343s | LR: 0.000115 | logit_scale: 4.605 | Global Batch Size: 128 2024-10-10,14:21:01 | INFO | Rank 0 | Global Steps: 24/3200 | Train Epoch: 1 [3072/4096 (75%)] | Loss: 4.433716 | Image2Text Acc: 1.56 | Text2Image Acc: 6.25 | Data Time: 0.041s | Batch Time: 0.340s | LR: 0.000120 | logit_scale: 4.605 | Global Batch Size: 128 2024-10-10,14:21:03 | INFO | Rank 0 | Global Steps: 25/3200 | Train Epoch: 1 [3200/4096 (78%)] | Loss: 4.339813 | Image2Text Acc: 3.91 | Text2Image Acc: 6.25 | Data Time: 1.788s | Batch Time: 2.085s | LR: 0.000125 | logit_scale: 4.605 | Global Batch Size: 128 2024-10-10,14:21:06 | INFO | Rank 0 | Global Steps: 26/3200 | Train Epoch: 1 [3328/4096 (81%)] | Loss: 4.351143 | Image2Text Acc: 2.34 | Text2Image Acc: 3.12 | Data Time: 2.790s | Batch Time: 3.092s | LR: 0.000130 | logit_scale: 4.605 | Global Batch Size: 128 2024-10-10,14:21:06 | INFO | Rank 0 | Global Steps: 27/3200 | Train Epoch: 1 [3456/4096 (84%)] | Loss: 4.369926 | Image2Text Acc: 2.34 | Text2Image Acc: 4.69 | Data Time: 0.043s | Batch Time: 0.338s | LR: 0.000135 | logit_scale: 4.605 | Global Batch Size: 128 2024-10-10,14:21:07 | INFO | Rank 0 | Global Steps: 28/3200 | Train Epoch: 1 [3584/4096 (88%)] | Loss: 4.199516 | Image2Text Acc: 3.12 | Text2Image Acc: 3.12 | Data Time: 0.037s | Batch Time: 0.335s | LR: 0.000140 | logit_scale: 4.605 | Global Batch Size: 128 2024-10-10,14:21:10 | INFO | Rank 0 | Global Steps: 29/3200 | Train Epoch: 1 [3712/4096 (91%)] | Loss: 4.327763 | Image2Text Acc: 3.91 | Text2Image Acc: 5.47 | Data Time: 3.056s | Batch Time: 3.354s | LR: 0.000145 | logit_scale: 4.605 | Global Batch Size: 128 2024-10-10,14:21:12 | INFO | Rank 0 | Global Steps: 30/3200 | Train Epoch: 1 [3840/4096 (94%)] | Loss: 4.432281 | Image2Text Acc: 2.34 | Text2Image Acc: 4.69 | Data Time: 1.928s | Batch Time: 2.226s | LR: 0.000150 | logit_scale: 4.605 | Global Batch Size: 128 2024-10-10,14:21:12 | INFO | Rank 0 | Global Steps: 31/3200 | Train Epoch: 1 [3968/4096 (97%)] | Loss: 4.358601 | Image2Text Acc: 6.25 | Text2Image Acc: 5.47 | Data Time: 0.037s | Batch Time: 0.332s | LR: 0.000155 | logit_scale: 4.604 | Global Batch Size: 128 2024-10-10,14:21:13 | INFO | Rank 0 | Global Steps: 32/3200 | Train Epoch: 1 [4096/4096 (100%)] | Loss: 4.322407 | Image2Text Acc: 4.69 | Text2Image Acc: 5.47 | Data Time: 0.037s | Batch Time: 0.332s | LR: 0.000160 | logit_scale: 4.604 | Global Batch Size: 128 2024-10-10,14:21:13 | INFO | Rank 0 | Begin to eval on validation set (epoch 1 @ 32 steps)... 2024-10-10,14:21:34 | INFO | Rank 0 | Validation Result (epoch 1 @ 32 steps) | Valid Loss: 4.217743 | Image2Text Acc: 3.91 | Text2Image Acc: 5.86 | logit_scale: 4.604 | Valid Batch Size: 128 2024-10-10,14:21:34 | INFO | Rank 0 | train LMDB file contains 4000 images and 4000 pairs. 2024-10-10,14:21:34 | INFO | Rank 0 | val LMDB file contains 500 images and 500 pairs. 2024-10-10,14:21:51 | INFO | Rank 0 | Saved checkpoint /workspace/code/experiments/demo_data_finetune_vit-b-16_roberta-base_bs128_8gpu/checkpoints/epoch1.pt (epoch 1 @ 32 steps) (writing took 17.00214672088623 seconds) 2024-10-10,14:22:08 | INFO | Rank 0 | Saved checkpoint /workspace/code/experiments/demo_data_finetune_vit-b-16_roberta-base_bs128_8gpu/checkpoints/epoch_latest.pt (epoch 1 @ 32 steps) (writing took 16.851421356201172 seconds) 2024-10-10,14:22:16 | INFO | Rank 0 | Global Steps: 33/3200 | Train Epoch: 2 [128/4096 (3%)] | Loss: 4.297310 | Image2Text Acc: 3.91 | Text2Image Acc: 5.47 | Data Time: 6.647s | Batch Time: 6.951s | LR: 0.000165 | logit_scale: 4.604 | Global Batch Size: 128 2024-10-10,14:22:16 | INFO | Rank 0 | Global Steps: 34/3200 | Train Epoch: 2 [256/4096 (6%)] | Loss: 4.284641 | Image2Text Acc: 4.69 | Text2Image Acc: 1.56 | Data Time: 0.042s | Batch Time: 0.339s | LR: 0.000170 | logit_scale: 4.604 | Global Batch Size: 128 2024-10-10,14:22:17 | INFO | Rank 0 | Global Steps: 35/3200 | Train Epoch: 2 [384/4096 (9%)] | Loss: 4.414612 | Image2Text Acc: 3.12 | Text2Image Acc: 4.69 | Data Time: 0.604s | Batch Time: 0.902s | LR: 0.000175 | logit_scale: 4.604 | Global Batch Size: 128 2024-10-10,14:22:17 | INFO | Rank 0 | Global Steps: 36/3200 | Train Epoch: 2 [512/4096 (12%)] | Loss: 4.766368 | Image2Text Acc: 3.12 | Text2Image Acc: 2.34 | Data Time: 0.036s | Batch Time: 0.331s | LR: 0.000180 | logit_scale: 4.604 | Global Batch Size: 128 2024-10-10,14:22:22 | INFO | Rank 0 | Global Steps: 37/3200 | Train Epoch: 2 [640/4096 (16%)] | Loss: 4.304634 | Image2Text Acc: 3.12 | Text2Image Acc: 2.34 | Data Time: 4.513s | Batch Time: 4.818s | LR: 0.000185 | logit_scale: 4.604 | Global Batch Size: 128 2024-10-10,14:22:22 | INFO | Rank 0 | Global Steps: 38/3200 | Train Epoch: 2 [768/4096 (19%)] | Loss: 4.475136 | Image2Text Acc: 3.12 | Text2Image Acc: 3.91 | Data Time: 0.038s | Batch Time: 0.334s | LR: 0.000190 | logit_scale: 4.604 | Global Batch Size: 128 2024-10-10,14:22:24 | INFO | Rank 0 | Global Steps: 39/3200 | Train Epoch: 2 [896/4096 (22%)] | Loss: 4.387520 | Image2Text Acc: 3.91 | Text2Image Acc: 3.91 | Data Time: 1.873s | Batch Time: 2.171s | LR: 0.000195 | logit_scale: 4.604 | Global Batch Size: 128 2024-10-10,14:22:25 | INFO | Rank 0 | Global Steps: 40/3200 | Train Epoch: 2 [1024/4096 (25%)] | Loss: 4.462852 | Image2Text Acc: 3.12 | Text2Image Acc: 3.91 | Data Time: 0.040s | Batch Time: 0.335s | LR: 0.000200 | logit_scale: 4.604 | Global Batch Size: 128 2024-10-10,14:22:29 | INFO | Rank 0 | Global Steps: 41/3200 | Train Epoch: 2 [1152/4096 (28%)] | Loss: 4.601013 | Image2Text Acc: 2.34 | Text2Image Acc: 1.56 | Data Time: 3.525s | Batch Time: 3.828s | LR: 0.000205 | logit_scale: 4.604 | Global Batch Size: 128 2024-10-10,14:22:29 | INFO | Rank 0 | Global Steps: 42/3200 | Train Epoch: 2 [1280/4096 (31%)] | Loss: 4.392643 | Image2Text Acc: 3.12 | Text2Image Acc: 3.91 | Data Time: 0.040s | Batch Time: 0.338s | LR: 0.000210 | logit_scale: 4.604 | Global Batch Size: 128 2024-10-10,14:22:30 | INFO | Rank 0 | Global Steps: 43/3200 | Train Epoch: 2 [1408/4096 (34%)] | Loss: 4.625900 | Image2Text Acc: 0.78 | Text2Image Acc: 3.91 | Data Time: 1.126s | Batch Time: 1.420s | LR: 0.000215 | logit_scale: 4.604 | Global Batch Size: 128 2024-10-10,14:22:31 | INFO | Rank 0 | Global Steps: 44/3200 | Train Epoch: 2 [1536/4096 (38%)] | Loss: 4.499672 | Image2Text Acc: 0.00 | Text2Image Acc: 2.34 | Data Time: 0.036s | Batch Time: 0.331s | LR: 0.000220 | logit_scale: 4.604 | Global Batch Size: 128 2024-10-10,14:22:35 | INFO | Rank 0 | Global Steps: 45/3200 | Train Epoch: 2 [1664/4096 (41%)] | Loss: 4.556229 | Image2Text Acc: 3.12 | Text2Image Acc: 4.69 | Data Time: 3.978s | Batch Time: 4.282s | LR: 0.000225 | logit_scale: 4.604 | Global Batch Size: 128 2024-10-10,14:22:35 | INFO | Rank 0 | Global Steps: 46/3200 | Train Epoch: 2 [1792/4096 (44%)] | Loss: 4.524231 | Image2Text Acc: 3.91 | Text2Image Acc: 3.12 | Data Time: 0.035s | Batch Time: 0.335s | LR: 0.000230 | logit_scale: 4.604 | Global Batch Size: 128 2024-10-10,14:22:37 | INFO | Rank 0 | Global Steps: 47/3200 | Train Epoch: 2 [1920/4096 (47%)] | Loss: 4.618797 | Image2Text Acc: 0.78 | Text2Image Acc: 3.12 | Data Time: 1.077s | Batch Time: 1.372s | LR: 0.000235 | logit_scale: 4.604 | Global Batch Size: 128 2024-10-10,14:22:37 | INFO | Rank 0 | Global Steps: 48/3200 | Train Epoch: 2 [2048/4096 (50%)] | Loss: 4.657127 | Image2Text Acc: 1.56 | Text2Image Acc: 2.34 | Data Time: 0.036s | Batch Time: 0.335s | LR: 0.000240 | logit_scale: 4.604 | Global Batch Size: 128 2024-10-10,14:22:41 | INFO | Rank 0 | Global Steps: 49/3200 | Train Epoch: 2 [2176/4096 (53%)] | Loss: 4.595833 | Image2Text Acc: 2.34 | Text2Image Acc: 1.56 | Data Time: 3.846s | Batch Time: 4.150s | LR: 0.000245 | logit_scale: 4.604 | Global Batch Size: 128 2024-10-10,14:22:42 | INFO | Rank 0 | Global Steps: 50/3200 | Train Epoch: 2 [2304/4096 (56%)] | Loss: 4.590302 | Image2Text Acc: 2.34 | Text2Image Acc: 2.34 | Data Time: 0.038s | Batch Time: 0.337s | LR: 0.000250 | logit_scale: 4.603 | Global Batch Size: 128 2024-10-10,14:22:43 | INFO | Rank 0 | Global Steps: 51/3200 | Train Epoch: 2 [2432/4096 (59%)] | Loss: 4.545975 | Image2Text Acc: 3.91 | Text2Image Acc: 1.56 | Data Time: 1.587s | Batch Time: 1.887s | LR: 0.000255 | logit_scale: 4.603 | Global Batch Size: 128 2024-10-10,14:22:44 | INFO | Rank 0 | Global Steps: 52/3200 | Train Epoch: 2 [2560/4096 (62%)] | Loss: 4.431618 | Image2Text Acc: 3.12 | Text2Image Acc: 3.91 | Data Time: 0.040s | Batch Time: 0.340s | LR: 0.000260 | logit_scale: 4.603 | Global Batch Size: 128 2024-10-10,14:22:47 | INFO | Rank 0 | Global Steps: 53/3200 | Train Epoch: 2 [2688/4096 (66%)] | Loss: 4.410482 | Image2Text Acc: 3.91 | Text2Image Acc: 3.12 | Data Time: 3.403s | Batch Time: 3.703s | LR: 0.000265 | logit_scale: 4.603 | Global Batch Size: 128 2024-10-10,14:22:48 | INFO | Rank 0 | Global Steps: 54/3200 | Train Epoch: 2 [2816/4096 (69%)] | Loss: 4.380985 | Image2Text Acc: 2.34 | Text2Image Acc: 5.47 | Data Time: 0.042s | Batch Time: 0.340s | LR: 0.000270 | logit_scale: 4.603 | Global Batch Size: 128 2024-10-10,14:22:50 | INFO | Rank 0 | Global Steps: 55/3200 | Train Epoch: 2 [2944/4096 (72%)] | Loss: 4.415821 | Image2Text Acc: 3.91 | Text2Image Acc: 3.12 | Data Time: 1.674s | Batch Time: 1.969s | LR: 0.000275 | logit_scale: 4.603 | Global Batch Size: 128 2024-10-10,14:22:50 | INFO | Rank 0 | Global Steps: 56/3200 | Train Epoch: 2 [3072/4096 (75%)] | Loss: 4.612484 | Image2Text Acc: 2.34 | Text2Image Acc: 3.12 | Data Time: 0.038s | Batch Time: 0.334s | LR: 0.000280 | logit_scale: 4.603 | Global Batch Size: 128 2024-10-10,14:22:54 | INFO | Rank 0 | Global Steps: 57/3200 | Train Epoch: 2 [3200/4096 (78%)] | Loss: 5.101124 | Image2Text Acc: 0.78 | Text2Image Acc: 3.91 | Data Time: 3.150s | Batch Time: 3.456s | LR: 0.000285 | logit_scale: 4.603 | Global Batch Size: 128 2024-10-10,14:22:54 | INFO | Rank 0 | Global Steps: 58/3200 | Train Epoch: 2 [3328/4096 (81%)] | Loss: 5.060188 | Image2Text Acc: 1.56 | Text2Image Acc: 0.78 | Data Time: 0.039s | Batch Time: 0.335s | LR: 0.000290 | logit_scale: 4.603 | Global Batch Size: 128 2024-10-10,14:22:56 | INFO | Rank 0 | Global Steps: 59/3200 | Train Epoch: 2 [3456/4096 (84%)] | Loss: 4.785370 | Image2Text Acc: 1.56 | Text2Image Acc: 3.91 | Data Time: 1.727s | Batch Time: 2.025s | LR: 0.000295 | logit_scale: 4.603 | Global Batch Size: 128 2024-10-10,14:22:56 | INFO | Rank 0 | Global Steps: 60/3200 | Train Epoch: 2 [3584/4096 (88%)] | Loss: 4.811279 | Image2Text Acc: 1.56 | Text2Image Acc: 2.34 | Data Time: 0.036s | Batch Time: 0.337s | LR: 0.000300 | logit_scale: 4.603 | Global Batch Size: 128 2024-10-10,14:23:00 | INFO | Rank 0 | Global Steps: 61/3200 | Train Epoch: 2 [3712/4096 (91%)] | Loss: 4.825073 | Image2Text Acc: 1.56 | Text2Image Acc: 0.78 | Data Time: 3.007s | Batch Time: 3.313s | LR: 0.000305 | logit_scale: 4.602 | Global Batch Size: 128 2024-10-10,14:23:00 | INFO | Rank 0 | Global Steps: 62/3200 | Train Epoch: 2 [3840/4096 (94%)] | Loss: 4.803360 | Image2Text Acc: 0.00 | Text2Image Acc: 0.78 | Data Time: 0.039s | Batch Time: 0.334s | LR: 0.000310 | logit_scale: 4.602 | Global Batch Size: 128 2024-10-10,14:23:01 | INFO | Rank 0 | Global Steps: 63/3200 | Train Epoch: 2 [3968/4096 (97%)] | Loss: 4.794815 | Image2Text Acc: 2.34 | Text2Image Acc: 1.56 | Data Time: 1.087s | Batch Time: 1.382s | LR: 0.000315 | logit_scale: 4.602 | Global Batch Size: 128 2024-10-10,14:23:02 | INFO | Rank 0 | Global Steps: 64/3200 | Train Epoch: 2 [4096/4096 (100%)] | Loss: 4.777557 | Image2Text Acc: 0.00 | Text2Image Acc: 2.34 | Data Time: 0.036s | Batch Time: 0.332s | LR: 0.000320 | logit_scale: 4.602 | Global Batch Size: 128 2024-10-10,14:23:02 | INFO | Rank 0 | Begin to eval on validation set (epoch 2 @ 64 steps)... 2024-10-10,14:23:18 | INFO | Rank 0 | Validation Result (epoch 2 @ 64 steps) | Valid Loss: 4.740685 | Image2Text Acc: 1.37 | Text2Image Acc: 1.37 | logit_scale: 4.602 | Valid Batch Size: 128 2024-10-10,14:23:19 | INFO | Rank 0 | train LMDB file contains 4000 images and 4000 pairs. 2024-10-10,14:23:19 | INFO | Rank 0 | val LMDB file contains 500 images and 500 pairs.

您好!我也遇到这个问题,训练acc很低,请问您解决了吗?

@cuppersd
Copy link

cuppersd commented Jan 2, 2025

我也遇到这样的问题,你们都解决了吗?acc一直很低

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants