diff --git a/.gitignore b/.gitignore index 337c774..bc26852 100644 --- a/.gitignore +++ b/.gitignore @@ -3,7 +3,6 @@ /m2_annotations evaluation/spice/* *.pyc -*.jar /saved_transformer_models /tensorboard_logs /visualization diff --git a/README.md b/README.md index 68382f8..9318eb5 100644 --- a/README.md +++ b/README.md @@ -36,11 +36,10 @@ python -m spacy download en_core_web_md ## Data Preparation -* **Annotation**. Download the annotation file [annotation.zip](https://drive.google.com/file/d/1Zc2P3-MIBg3JcHT1qKeYuQt9CnQcY5XJ/view?usp=sharing) [1]. Extract and put it in the project root directory. -* **Feature**. Download processed image features [ResNeXt-101](https://stduestceducn-my.sharepoint.com/:f:/g/personal/zhn_std_uestc_edu_cn/EssZY4Xdb0JErCk0A1Yx3vUBaRbXau88scRvYw4r1ZuwPg?e=f2QFGp) and [ResNeXt-152](https://stduestceducn-my.sharepoint.com/:f:/g/personal/zhn_std_uestc_edu_cn/EssZY4Xdb0JErCk0A1Yx3vUBaRbXau88scRvYw4r1ZuwPg?e=f2QFGp) features [2], put it in the project root directory. +* **Annotation**. Download the annotation file [m2_annotations](https://drive.google.com/file/d/12EdMHuwLjHZPAMRJNrt3xSE2AMf7Tz8y/view?usp=sharing) [1]. Extract and put it in the project root directory. +* **Feature**. Download processed image features [ResNeXt-101](https://pan.baidu.com/s/1avz9zaQ7c36XfVFK3ZZZ5w) and [ResNeXt-152](https://pan.baidu.com/s/1avz9zaQ7c36XfVFK3ZZZ5w) features [2] (code `9vtB`), put it in the project root directory. - ## Training Run `python train_transformer.py` using the following arguments: @@ -83,7 +82,6 @@ We provide pretrained model [here](https://drive.google.com/file/d/1Y133r4Wd9edi | Reproduced Model (ResNext101) | 81.2 | 39.9 | 29.6 | 59.1 | 133.7 | 23.3| - ### Online Evaluation We also report the performance of our model on the online COCO test server with an ensemble of four S2 models. The detailed online test code can be obtained in this [repo](https://github.com/zhangxuying1004/RSTNet). @@ -95,8 +93,8 @@ Huang, and Rongrong Ji. Rstnet: Captioning with adaptive attention on visual and ### Citation ``` @inproceedings{S2, - author = {Pengpeng Zeng and - Haonan Zhang and + author = {Pengpeng Zeng* and + Haonan Zhang* and Jingkuan Song and Lianli Gao}, title = {S2 Transformer for Image Captioning}, @@ -107,4 +105,4 @@ Huang, and Rongrong Ji. Rstnet: Captioning with adaptive attention on visual and ``` ## Acknowledgements Thanks Zhang _et.al_ for releasing the visual features (ResNeXt-101 and ResNeXt-152). Our code implementation is also based on their [repo](https://github.com/zhangxuying1004/RSTNet). -Thanks for the original annotations prepared by [M2 Transformer](https://github.com/aimagelab/meshed-memory-transformer), and effective visual representation from [grid-feats-vqa](https://github.com/facebookresearch/grid-feats-vqa). \ No newline at end of file +Thanks for the original annotations prepared by [M2 Transformer](https://github.com/aimagelab/meshed-memory-transformer), and effective visual representation from [grid-feats-vqa](https://github.com/facebookresearch/grid-feats-vqa). diff --git a/evaluation/meteor/meteor-1.5.jar b/evaluation/meteor/meteor-1.5.jar new file mode 100644 index 0000000..a833bc0 Binary files /dev/null and b/evaluation/meteor/meteor-1.5.jar differ diff --git a/evaluation/stanford-corenlp-3.4.1.jar b/evaluation/stanford-corenlp-3.4.1.jar new file mode 100644 index 0000000..3cfa0a0 Binary files /dev/null and b/evaluation/stanford-corenlp-3.4.1.jar differ diff --git a/test_transformer.py b/test_transformer.py index 90b2aeb..5511004 100644 --- a/test_transformer.py +++ b/test_transformer.py @@ -49,12 +49,12 @@ def predict_captions(model, dataloader, text_field): device = torch.device('cuda') parser = argparse.ArgumentParser(description='Transformer') - parser.add_argument('--batch_size', type=int, default=10) - parser.add_argument('--workers', type=int, default=4) + parser.add_argument('--batch_size', type=int, default=50) + parser.add_argument('--workers', type=int, default=12) parser.add_argument('--m', type=int, default=40) - parser.add_argument('--features_path', type=str, default='/home/zhanghaonan/RSTNet-master/X101-features/X101_grid_feats_coco_trainval.hdf5') - parser.add_argument('--annotation_folder', type=str, default='/home/zhanghaonan/RSTNet-master/m2_annotations') + parser.add_argument('--features_path', type=str, default='./X101-features/X101_grid_feats_coco_trainval.hdf5') + parser.add_argument('--annotation_folder', type=str, default='./m2_annotations') # the path of tested model and vocabulary parser.add_argument('--model_path', type=str, default='saved_transformer_models/demo_rl_v5_best_test.pth')