diff --git a/README.md b/README.md index 46384bc..e69de29 100644 --- a/README.md +++ b/README.md @@ -1,318 +0,0 @@ -# The Hateful Memes challenge - -## Post-competition findings -While I was cleaning up the code and documenting my best leaderboard solution I discovered -I had made a mistake when training the model. Specifically the model was being trained -after the first epoch in eval mode only, i.e. no dropout and batchnorm. -This is obviously a mistake and you shouldn't do it. Turns out that a single UNITER -model might be on par with my best leaderboard solution involving 12 models with paired-attention, -which seems to have little to no effect. As a consequence I expect now an ensemble of plain UNITER models to outperform -my best leaderboard solution. Imagine my unpleasant surprise finding this after the competition ended. -Oh well, so if you want to do it the right way, make sure to include -`model.train()` [here](https://github.com/vladsandulescu/hatefulmemes/blob/c966336ddbff0a938a8e1632baa7032c6e84f050/UNITER/train_hm.py#L350), -before returning the test results, similar to the validation part. - -*NOTE*: The rest of the documentation below reflects the state of the solution until the competition ended -and since the organizers need to reproduce my solution, I will not push the post-competition fix just yet. - -## Introduction -My best scoring solution to the [Hateful Memes: Phase 2](https://www.drivendata.org/competitions/70/hateful-memes-phase-2/) -challenge comprises of an ensemble of a single UNITER model architecture (average over probabilities) -[[paper]](https://arxiv.org/abs/1909.11740) [[code]](https://github.com/ChenRocks/UNITER), -which I have adapted for this competition. I have customized the paired-attention -approach from the UNITER paper to include image captions -inferred from the [Im2txt implemention](https://github.com/HughKu/Im2txt) of the -[Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge](http://arxiv.org/abs/1609.06647) -paper. Albeit it didn't improve my results by a lot (around 1% for the AUC). However I really liked the simplicity of it. I extracted ROI boxes and -image features by adapting the -[Bottom-up Attention with Detectron2](https://github.com/airsplay/py-bottom-up-attention) -work which in itself is a pytorch adaptation of the original -[bottom-up-attention](https://github.com/peteanderson80/bottom-up-attention) approach using -caffe. - -In a nutshell, the paired attention approach was used in the UNITER paper to pair images for the NLVR competition. There the model input was a pair -[[img1, txt], [img2, txt]], thus repeating the text for each pair. I basically just turned that on its head and did [[img, txt1], [img, txt2]] where the first text was the supplied meme text and the second text was the caption from running inference on the images using the Im2txt code. - -The overall goal was to get familiar with the multimodal SOTA models out there -and not focus too much on fancier ensembling or stacking. I find it much more fun to try to improve -a single architecture. I also haven't spent much time tuning the hyperparameters, -you will notice they are quite similar to the ones the UNITER authors used. - -A lot of info up to here, let's take it step by step. If you want to replicate my solution, -here's what you need to do. - -### Prepare - -Please read the instructions entirely first, before starting the process, to ensure you have -at last some overview before you start. It is also **very** useful to read the installation instructions -from the original repos I have used to get an even better overview. -In my experience, running somebody else's scripts on your own instance setup rarely works out of the box the first time. - -### Environment -* Ubuntu 16.04, CUDA 9.0, GCC 5.4.0 -* Anaconda 3 -* Python 3.6.x (pandas and jupyterlab needed) - -### Separate multiple conda environments -The project consists of three sub-projects which have been adapted for this task: -1. [Bottom-up Attention with Detectron2](https://github.com/airsplay/py-bottom-up-attention) -2. [Im2txt: image captioning inference](https://github.com/HughKu/Im2txt) -3. [UNITER: UNiversal Image-TExt Representation Learning](https://github.com/ChenRocks/UNITER) - -I **strongly** recommend creating a separate conda environment for each. -For this there is a script in the *conda* folder in each of the sub-projects. -They are all independent, so they should each work as a standalone project, requiring only a path to the data folder. -Sure there is a flow, with py-bottom-up-attention and Im2txt working as feature extractors for UNITER down the line. -It is more for convenience I bundled them in one package in order to document the solution. -Once you cloned the repo, you should see this structure and among other things, there are three scripts: -```bash - hatefulmemes - ├── Im2txt - │   ├── conda - │   │   ├── init_im2txt_ubuntu.sh - │   ├── ... - ├── py-bottom-up-attention - │   ├── conda - │   │   ├── init_bua_ubuntu.sh - │   ├── ... - └── UNITER - ├── conda - │   ├── init_uniter_ubuntu.sh - └── ... -``` -Please pay attention to the `/path/to/` and `anaconda3` paths in the scripts. -You need to change this to the location of your cloned repo and conda respectively. -Afterwards run the scripts to create each of the conda environments. - -### Alternative installation instructions -If you prefer not to, or you cannot make it work using the scripts above, -you can also follow the installation instructions from the original repos. -That's where my scripts come from anyway. - -### Download required datasets and pretrained models -#### 0. The HM dataset -Grab the [HM dataset for phase 2](https://www.drivendata.org/competitions/70/hateful-memes-phase-2/data/) -and place the unzipped files inside a `data` folder, under the project root for example -(on the same level with `Im2txt`, `py-bottom-up-attention` and `UNITER`). - -Grab the [dev_seen_unseen.jsonl](https://drive.google.com/file/d/1ASW0JTYxl9Wazu3GVeqAllaWxndUq1iY/view?usp=sharing) file and place it in the same folder as the other `jsonl` files -provided by the organizers. - -*NOTE*: -Since no good data should go to waste, I have merged the dev_seen and dev_unseen sets. -This increased the size of the validation set to 640 samples, instead of 500. -Also the final predictions on the test_unseen set have been produced by training -the UNITER model on **train+dev_seen_unseen**. - -If you don't want to readily download the **dev_seen_unseen.jsonl** file, -you can also build it yourself by running the `notebooks/ph2_merge_dev_seen_unseen.ipynb` notebook. - -#### 1. py-bottom-up-attention -The required pretrained models should be downloaded automatically when running it. - -Otherwise please follow the [original instructions](https://github.com/airsplay/py-bottom-up-attention#note) and -download the [10-100 boxes original model weights](http://nlp.cs.unc.edu/models/faster_rcnn_from_caffe_attr_original.pkl). -Or take the shortcut below, but please always check the original link since the author might have made changes in the meantime. -```bash -wget --no-check-certificate http://nlp.cs.unc.edu/models/faster_rcnn_from_caffe_attr_original.pkl -P ~/.torch/fvcore_cache/models/ -``` - -#### 2. Im2txt -For Im2txt follow the [original instructions](https://github.com/HughKu/Im2txt#get-pre-trained-model), -or take the shortcut below, but please always check the original link since the author might have made changes in the meantime. - -"Download [inceptionv3 finetuned parameters over 1M](https://drive.google.com/open?id=1r4-9FEIbOUyBSvA-fFVFgvhFpgee6sF5) -and you will get 4 files, and make sure to put them all into this path -`Im2txt/im2txt/model/Hugh/train/` -* newmodel.ckpt-2000000.data-00000-of-00001 -* newmodel.ckpt-2000000.index -* newmodel.ckpt-2000000.meta -* checkpoint -" - -#### 3. UNITER -For Uniter follow the [original instructions](https://github.com/ChenRocks/UNITER#quick-start), -more specifically the [pretrained UNITER-large model](https://github.com/ChenRocks/UNITER/blob/master/scripts/download_pretrained.sh), -or take the shortcut below to download the pretrained model, but please always check the original link since the author might have made changes in the meantime. - -*NOTE*: the path to the pretrained model has to match the path in your training config file. - -```bash -wget https://convaisharables.blob.core.windows.net/uniter/pretrained/uniter-large.pt -P /path/to/UNITER/storage/pretrained/ -``` - -### Running the models -#### 1. Extracting bounding boxes and image features - -You can download the already extracted bboxes + features files -[here](https://drive.google.com/file/d/1XwLCpawhF4AMzJRG7sqG4uPywnL9YrTF/view?usp=sharing). -Otherwise you can extract them yourself using the following code from `py-bottom-up-attention`. -The code extracts the features and places the `tsv` file for all the images in the `imgfeat` folder. -This will take about one hour to execute (NVidia V100 GPU). However, note that you will implicitly -shuffle the training set by doing so and UNITER's sampler will cause training batches to be different -than my training batches. This means that even if you replicate the environment precisely, there might -still be small differences in results. But since the final step is an average over probabilities, the -overall differences on the test_unseen set should be negligible. - -```bash -conda activate bua - -python demo/detectron2_mscoco_proposal_maxnms_hm.py - --split img - --data_path /path/to/data/ - --output_path /path/to/data/imgfeat/ - --output_type tsv - --min_boxes 10 --max_boxes 100 -``` -Split up the `tsv` file into each of the sets, by joining with the `jsonl` files. This will -create another set of `tsv` files, which will be ingested by UNITER. -```bash -python demo/hm.py --split img --split_json_file train.jsonl --d2_file_suffix d2_10-100_vg --data_path /path/to/data/ --output_path /path/to/data/imgfeat/ -python demo/hm.py --split img --split_json_file dev_seen_unseen.jsonl --d2_file_suffix d2_10-100_vg --data_path /path/to/data/ --output_path /path/to/data/imgfeat/ -python demo/hm.py --split img --split_json_file test_unseen.jsonl --d2_file_suffix d2_10-100_vg --data_path /path/to/data/ --output_path /path/to/data/imgfeat/ -``` - -This is how your `data` folder structure should look like now: -```bash - hatefulmemes - ├── data - │   ├── img (all the memes, .png files) - │   ├── train.jsonl - │   ├── dev_seen.jsonl - │   ├── dev_unseen.jsonl - │   ├── dev_seen_unseen.jsonl - │   ├── test_unseen.jsonl - │   ├── imgfeat/d2_10-100_vg/tsv/img.tsv - │   ├── data_train_d2_10-100_vg.tsv - │   ├── data_dev_seen_unseen_d2_10-100_vg.tsv - │   ├── data_test_unseen_d2_10-100_vg.tsv - │   ├── ... - ├── Im2txt - ├── py-bottom-up-attention - └── UNITER -``` - -#### 2. Inferring image captions -Download the already inferred captions file -[here](https://drive.google.com/file/d/1VhXKeMS1CNfhUOrVe93QxYMa6BGzZdTX/view?usp=sharing) and place it under `data/im2txt` folder. -Otherwise, run the following code yourself. - -This will take about one hour on a multicore CPU only machine. - -```bash -conda activate im2txt - -python im2txt/run_inference.py - --checkpoint_path="im2txt/model/Hugh/train/newmodel.ckpt-2000000" - --vocab_file="im2txt/data/Hugh/word_counts.txt" - --input_files="/path/to/data/img/*.png" -``` -Take the output csv file `df_ph2.csv` and remember to place it under `data/im2txt` folder. - -This is how your `data` folder structure should look like now: -```bash - hatefulmemes - ├── data - │   ├── im2txt/df_ph2.csv - │   ├── ... - │   ├── img (all the memes, .png files) - │   ├── train.jsonl - │   ├── dev_seen.jsonl - │   ├── dev_unseen.jsonl - │   ├── dev_seen_unseen.jsonl - │   ├── test_unseen.jsonl - │   ├── imgfeat/d2_10-100_vg/tsv/img.tsv - │   ├── data_train_d2_10-100_vg.tsv - │   ├── data_dev_seen_unseen_d2_10-100_vg.tsv - │   ├── data_test_unseen_d2_10-100_vg.tsv - │   ├── ... - ├── Im2txt - ├── py-bottom-up-attention - └── UNITER -``` - -#### 3. Training UNITER with paired attention -Training just one UNITER model on both `train+dev_seen_unseen` takes about one hour on -a Nvidia V100 GPU with 32GB RAM. If you only have 16GB RAM on your GPU, -you have to modify `train_batch_size` and `gradient_accumulation_steps` to keep the -effective batch size the same. -E.g. If you reduce the train_batch_size by half to 3328, the gradient_accumulation_steps needs to be doubled to 2. -Also the current implementation does not support distributed training. - -*NOTE*: If you modify the config file however, you will of course get slightly different results, -but the overall difference should be negligible when averaging over probabilities output from multiple UNITER models. - -To replicate my leaderboard solution, you need to train 12 UNITER models with different seeds. Why 12? Well, because the number has a certain something to it, there were 12 monkeys, or 12 full lunations of the moon in a year, or the number of years for a full cycle of Jupiter, and the reasons can go on and on I guess. -The final probabilities should be the average over the probabilities from the 12 model ensemble. -The only difference between these UNITER models is simply the seed. Training is done as mentioned -before on both `train+dev_seen_unseen` and every `valid_steps` inference is performed on `test_unseen`, -writing the results to a csv file. -```bash -conda activate uniter - -python train_hm.py --config config/ph2_uniter_seeds/train-hm-large-pa-1gpu-hpc_0.json -python train_hm.py --config config/ph2_uniter_seeds/train-hm-large-pa-1gpu-hpc_24.json -python train_hm.py --config config/ph2_uniter_seeds/train-hm-large-pa-1gpu-hpc_42.json -python train_hm.py --config config/ph2_uniter_seeds/train-hm-large-pa-1gpu-hpc_77.json -python train_hm.py --config config/ph2_uniter_seeds/train-hm-large-pa-1gpu-hpc_2018.json -python train_hm.py --config config/ph2_uniter_seeds/train-hm-large-pa-1gpu-hpc_12345.json -python train_hm.py --config config/ph2_uniter_seeds/train-hm-large-pa-1gpu-hpc_32768.json -python train_hm.py --config config/ph2_uniter_seeds/train-hm-large-pa-1gpu-hpc_54321.json -python train_hm.py --config config/ph2_uniter_seeds/train-hm-large-pa-1gpu-hpc_10101010.json -python train_hm.py --config config/ph2_uniter_seeds/train-hm-large-pa-1gpu-hpc_20200905.json -python train_hm.py --config config/ph2_uniter_seeds/train-hm-large-pa-1gpu-hpc_55555555.json -python train_hm.py --config config/ph2_uniter_seeds/train-hm-large-pa-1gpu-hpc_2147483647.json -``` - -*NOTE*: You should only care about the files `test_results_1140_rank0_final.csv`, the rest are just intermediate results. -Use the `notebooks/ph2_leaderboard.ipynb` to generate the final leaderboard results, but make sure to change the UNITER -output paths first. - -#### 4. Running inference to get predictions -You can download the [checkpoints](https://drive.google.com/file/d/18xuRFdlDNrUyXKA1AzENai7Z0h_rM3aF/view?usp=sharing) for all of the 12 models and then just run inference to get the predictions without training the models. -In order to run inference on the test set, you need to have two json files in place next to a checkpoint. -`/path/to/train_dir/log/model.json` and `/path/to/train_dir/log/hps.json`. These should match -the default pretrained UNITER-large `config/uniter-large.json` file and -your UNITER training file respectively, -e.g. `config/ph2_uniter_seeds/train-hm-large-pa-1gpu-hpc_0.json` for the model using seed 0. -You can simply copy and rename the two files to `model.json` and `hps.json`, but do keep in mind the different seed from each checkpoint. - -*NOTE*: If you don't download the checkpoints and prefer to train the model yourself, you don't -need to copy these files, since the training part will already copy them in the correct folder. - -The structure of the checkpoint and log folders should look like this: -```bash - ├── seed_0 - │   ├── ckpt - │   │   ├── model_step_1140.pt - │   ├── log - │   │   ├── model.json (*NOTE* content matches config/uniter-large.json) - │   │   ├── hps.json (*NOTE* content matches config/ph2_uniter_seeds/train-hm-large-pa-1gpu-hpc_0.json) - │   ├── ... - ├── seed_24 - │   ├── ckpt - │   │   ├── model_step_1140.pt - │   ├── log - │   │   ├── model.json (*NOTE* content matches config/uniter-large.json) - │   │   ├── hps.json (*NOTE* content matches config/ph2_uniter_seeds/train-hm-large-pa-1gpu-hpc_24.json) - ├── ... -``` -Run inference on the `test_unseen` set. -```bash -python inf_hm.py --root_path ./ --dataset_path /path/to/data \ - --test_image_set test --train_dir /path/to/train_dir \ - --ckpt 1140 --output_dir /path/to/output \ - --fp16 -``` - -## Citation -If you find this code useful for your research, please consider citing: - -``` -@article{sandulescu2020detecting, - title={Detecting Hateful Memes Using a Multimodal Deep Ensemble}, - author={Sandulescu, Vlad}, - journal={arXiv preprint arXiv:2012.13235}, - year={2020} -} -``` diff --git a/README_BASELINE.md b/README_BASELINE.md new file mode 100644 index 0000000..e84b888 --- /dev/null +++ b/README_BASELINE.md @@ -0,0 +1,320 @@ +Please go to the original repo [vladsandulescu/hatefulmemes](https://github.com/vladsandulescu/hatefulmemes) in case that there are new changes. + +# The Hateful Memes challenge + +## Post-competition findings +While I was cleaning up the code and documenting my best leaderboard solution I discovered +I had made a mistake when training the model. Specifically the model was being trained +after the first epoch in eval mode only, i.e. no dropout and batchnorm. +This is obviously a mistake and you shouldn't do it. Turns out that a single UNITER +model might be on par with my best leaderboard solution involving 12 models with paired-attention, +which seems to have little to no effect. As a consequence I expect now an ensemble of plain UNITER models to outperform +my best leaderboard solution. Imagine my unpleasant surprise finding this after the competition ended. +Oh well, so if you want to do it the right way, make sure to include +`model.train()` [here](https://github.com/vladsandulescu/hatefulmemes/blob/c966336ddbff0a938a8e1632baa7032c6e84f050/UNITER/train_hm.py#L350), +before returning the test results, similar to the validation part. + +*NOTE*: The rest of the documentation below reflects the state of the solution until the competition ended +and since the organizers need to reproduce my solution, I will not push the post-competition fix just yet. + +## Introduction +My best scoring solution to the [Hateful Memes: Phase 2](https://www.drivendata.org/competitions/70/hateful-memes-phase-2/) +challenge comprises of an ensemble of a single UNITER model architecture (average over probabilities) +[[paper]](https://arxiv.org/abs/1909.11740) [[code]](https://github.com/ChenRocks/UNITER), +which I have adapted for this competition. I have customized the paired-attention +approach from the UNITER paper to include image captions +inferred from the [Im2txt implemention](https://github.com/HughKu/Im2txt) of the +[Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge](http://arxiv.org/abs/1609.06647) +paper. Albeit it didn't improve my results by a lot (around 1% for the AUC). However I really liked the simplicity of it. I extracted ROI boxes and +image features by adapting the +[Bottom-up Attention with Detectron2](https://github.com/airsplay/py-bottom-up-attention) +work which in itself is a pytorch adaptation of the original +[bottom-up-attention](https://github.com/peteanderson80/bottom-up-attention) approach using +caffe. + +In a nutshell, the paired attention approach was used in the UNITER paper to pair images for the NLVR competition. There the model input was a pair +[[img1, txt], [img2, txt]], thus repeating the text for each pair. I basically just turned that on its head and did [[img, txt1], [img, txt2]] where the first text was the supplied meme text and the second text was the caption from running inference on the images using the Im2txt code. + +The overall goal was to get familiar with the multimodal SOTA models out there +and not focus too much on fancier ensembling or stacking. I find it much more fun to try to improve +a single architecture. I also haven't spent much time tuning the hyperparameters, +you will notice they are quite similar to the ones the UNITER authors used. + +A lot of info up to here, let's take it step by step. If you want to replicate my solution, +here's what you need to do. + +### Prepare + +Please read the instructions entirely first, before starting the process, to ensure you have +at last some overview before you start. It is also **very** useful to read the installation instructions +from the original repos I have used to get an even better overview. +In my experience, running somebody else's scripts on your own instance setup rarely works out of the box the first time. + +### Environment +* Ubuntu 16.04, CUDA 9.0, GCC 5.4.0 +* Anaconda 3 +* Python 3.6.x (pandas and jupyterlab needed) + +### Separate multiple conda environments +The project consists of three sub-projects which have been adapted for this task: +1. [Bottom-up Attention with Detectron2](https://github.com/airsplay/py-bottom-up-attention) +2. [Im2txt: image captioning inference](https://github.com/HughKu/Im2txt) +3. [UNITER: UNiversal Image-TExt Representation Learning](https://github.com/ChenRocks/UNITER) + +I **strongly** recommend creating a separate conda environment for each. +For this there is a script in the *conda* folder in each of the sub-projects. +They are all independent, so they should each work as a standalone project, requiring only a path to the data folder. +Sure there is a flow, with py-bottom-up-attention and Im2txt working as feature extractors for UNITER down the line. +It is more for convenience I bundled them in one package in order to document the solution. +Once you cloned the repo, you should see this structure and among other things, there are three scripts: +```bash + hatefulmemes + ├── Im2txt + │   ├── conda + │   │   ├── init_im2txt_ubuntu.sh + │   ├── ... + ├── py-bottom-up-attention + │   ├── conda + │   │   ├── init_bua_ubuntu.sh + │   ├── ... + └── UNITER + ├── conda + │   ├── init_uniter_ubuntu.sh + └── ... +``` +Please pay attention to the `/path/to/` and `anaconda3` paths in the scripts. +You need to change this to the location of your cloned repo and conda respectively. +Afterwards run the scripts to create each of the conda environments. + +### Alternative installation instructions +If you prefer not to, or you cannot make it work using the scripts above, +you can also follow the installation instructions from the original repos. +That's where my scripts come from anyway. + +### Download required datasets and pretrained models +#### 0. The HM dataset +Grab the [HM dataset for phase 2](https://www.drivendata.org/competitions/70/hateful-memes-phase-2/data/) +and place the unzipped files inside a `data` folder, under the project root for example +(on the same level with `Im2txt`, `py-bottom-up-attention` and `UNITER`). + +Grab the [dev_seen_unseen.jsonl](https://drive.google.com/file/d/1ASW0JTYxl9Wazu3GVeqAllaWxndUq1iY/view?usp=sharing) file and place it in the same folder as the other `jsonl` files +provided by the organizers. + +*NOTE*: +Since no good data should go to waste, I have merged the dev_seen and dev_unseen sets. +This increased the size of the validation set to 640 samples, instead of 500. +Also the final predictions on the test_unseen set have been produced by training +the UNITER model on **train+dev_seen_unseen**. + +If you don't want to readily download the **dev_seen_unseen.jsonl** file, +you can also build it yourself by running the `notebooks/ph2_merge_dev_seen_unseen.ipynb` notebook. + +#### 1. py-bottom-up-attention +The required pretrained models should be downloaded automatically when running it. + +Otherwise please follow the [original instructions](https://github.com/airsplay/py-bottom-up-attention#note) and +download the [10-100 boxes original model weights](http://nlp.cs.unc.edu/models/faster_rcnn_from_caffe_attr_original.pkl). +Or take the shortcut below, but please always check the original link since the author might have made changes in the meantime. +```bash +wget --no-check-certificate http://nlp.cs.unc.edu/models/faster_rcnn_from_caffe_attr_original.pkl -P ~/.torch/fvcore_cache/models/ +``` + +#### 2. Im2txt +For Im2txt follow the [original instructions](https://github.com/HughKu/Im2txt#get-pre-trained-model), +or take the shortcut below, but please always check the original link since the author might have made changes in the meantime. + +"Download [inceptionv3 finetuned parameters over 1M](https://drive.google.com/open?id=1r4-9FEIbOUyBSvA-fFVFgvhFpgee6sF5) +and you will get 4 files, and make sure to put them all into this path +`Im2txt/im2txt/model/Hugh/train/` +* newmodel.ckpt-2000000.data-00000-of-00001 +* newmodel.ckpt-2000000.index +* newmodel.ckpt-2000000.meta +* checkpoint +" + +#### 3. UNITER +For Uniter follow the [original instructions](https://github.com/ChenRocks/UNITER#quick-start), +more specifically the [pretrained UNITER-large model](https://github.com/ChenRocks/UNITER/blob/master/scripts/download_pretrained.sh), +or take the shortcut below to download the pretrained model, but please always check the original link since the author might have made changes in the meantime. + +*NOTE*: the path to the pretrained model has to match the path in your training config file. + +```bash +wget https://convaisharables.blob.core.windows.net/uniter/pretrained/uniter-large.pt -P /path/to/UNITER/storage/pretrained/ +``` + +### Running the models +#### 1. Extracting bounding boxes and image features + +You can download the already extracted bboxes + features files +[here](https://drive.google.com/file/d/1XwLCpawhF4AMzJRG7sqG4uPywnL9YrTF/view?usp=sharing). +Otherwise you can extract them yourself using the following code from `py-bottom-up-attention`. +The code extracts the features and places the `tsv` file for all the images in the `imgfeat` folder. +This will take about one hour to execute (NVidia V100 GPU). However, note that you will implicitly +shuffle the training set by doing so and UNITER's sampler will cause training batches to be different +than my training batches. This means that even if you replicate the environment precisely, there might +still be small differences in results. But since the final step is an average over probabilities, the +overall differences on the test_unseen set should be negligible. + +```bash +conda activate bua + +python demo/detectron2_mscoco_proposal_maxnms_hm.py + --split img + --data_path /path/to/data/ + --output_path /path/to/data/imgfeat/ + --output_type tsv + --min_boxes 10 --max_boxes 100 +``` +Split up the `tsv` file into each of the sets, by joining with the `jsonl` files. This will +create another set of `tsv` files, which will be ingested by UNITER. +```bash +python demo/hm.py --split img --split_json_file train.jsonl --d2_file_suffix d2_10-100_vg --data_path /path/to/data/ --output_path /path/to/data/imgfeat/ +python demo/hm.py --split img --split_json_file dev_seen_unseen.jsonl --d2_file_suffix d2_10-100_vg --data_path /path/to/data/ --output_path /path/to/data/imgfeat/ +python demo/hm.py --split img --split_json_file test_unseen.jsonl --d2_file_suffix d2_10-100_vg --data_path /path/to/data/ --output_path /path/to/data/imgfeat/ +``` + +This is how your `data` folder structure should look like now: +```bash + hatefulmemes + ├── data + │   ├── img (all the memes, .png files) + │   ├── train.jsonl + │   ├── dev_seen.jsonl + │   ├── dev_unseen.jsonl + │   ├── dev_seen_unseen.jsonl + │   ├── test_unseen.jsonl + │   ├── imgfeat/d2_10-100_vg/tsv/img.tsv + │   ├── data_train_d2_10-100_vg.tsv + │   ├── data_dev_seen_unseen_d2_10-100_vg.tsv + │   ├── data_test_unseen_d2_10-100_vg.tsv + │   ├── ... + ├── Im2txt + ├── py-bottom-up-attention + └── UNITER +``` + +#### 2. Inferring image captions +Download the already inferred captions file +[here](https://drive.google.com/file/d/1VhXKeMS1CNfhUOrVe93QxYMa6BGzZdTX/view?usp=sharing) and place it under `data/im2txt` folder. +Otherwise, run the following code yourself. + +This will take about one hour on a multicore CPU only machine. + +```bash +conda activate im2txt + +python im2txt/run_inference.py + --checkpoint_path="im2txt/model/Hugh/train/newmodel.ckpt-2000000" + --vocab_file="im2txt/data/Hugh/word_counts.txt" + --input_files="/path/to/data/img/*.png" +``` +Take the output csv file `df_ph2.csv` and remember to place it under `data/im2txt` folder. + +This is how your `data` folder structure should look like now: +```bash + hatefulmemes + ├── data + │   ├── im2txt/df_ph2.csv + │   ├── ... + │   ├── img (all the memes, .png files) + │   ├── train.jsonl + │   ├── dev_seen.jsonl + │   ├── dev_unseen.jsonl + │   ├── dev_seen_unseen.jsonl + │   ├── test_unseen.jsonl + │   ├── imgfeat/d2_10-100_vg/tsv/img.tsv + │   ├── data_train_d2_10-100_vg.tsv + │   ├── data_dev_seen_unseen_d2_10-100_vg.tsv + │   ├── data_test_unseen_d2_10-100_vg.tsv + │   ├── ... + ├── Im2txt + ├── py-bottom-up-attention + └── UNITER +``` + +#### 3. Training UNITER with paired attention +Training just one UNITER model on both `train+dev_seen_unseen` takes about one hour on +a Nvidia V100 GPU with 32GB RAM. If you only have 16GB RAM on your GPU, +you have to modify `train_batch_size` and `gradient_accumulation_steps` to keep the +effective batch size the same. +E.g. If you reduce the train_batch_size by half to 3328, the gradient_accumulation_steps needs to be doubled to 2. +Also the current implementation does not support distributed training. + +*NOTE*: If you modify the config file however, you will of course get slightly different results, +but the overall difference should be negligible when averaging over probabilities output from multiple UNITER models. + +To replicate my leaderboard solution, you need to train 12 UNITER models with different seeds. Why 12? Well, because the number has a certain something to it, there were 12 monkeys, or 12 full lunations of the moon in a year, or the number of years for a full cycle of Jupiter, and the reasons can go on and on I guess. +The final probabilities should be the average over the probabilities from the 12 model ensemble. +The only difference between these UNITER models is simply the seed. Training is done as mentioned +before on both `train+dev_seen_unseen` and every `valid_steps` inference is performed on `test_unseen`, +writing the results to a csv file. +```bash +conda activate uniter + +python train_hm.py --config config/ph2_uniter_seeds/train-hm-large-pa-1gpu-hpc_0.json +python train_hm.py --config config/ph2_uniter_seeds/train-hm-large-pa-1gpu-hpc_24.json +python train_hm.py --config config/ph2_uniter_seeds/train-hm-large-pa-1gpu-hpc_42.json +python train_hm.py --config config/ph2_uniter_seeds/train-hm-large-pa-1gpu-hpc_77.json +python train_hm.py --config config/ph2_uniter_seeds/train-hm-large-pa-1gpu-hpc_2018.json +python train_hm.py --config config/ph2_uniter_seeds/train-hm-large-pa-1gpu-hpc_12345.json +python train_hm.py --config config/ph2_uniter_seeds/train-hm-large-pa-1gpu-hpc_32768.json +python train_hm.py --config config/ph2_uniter_seeds/train-hm-large-pa-1gpu-hpc_54321.json +python train_hm.py --config config/ph2_uniter_seeds/train-hm-large-pa-1gpu-hpc_10101010.json +python train_hm.py --config config/ph2_uniter_seeds/train-hm-large-pa-1gpu-hpc_20200905.json +python train_hm.py --config config/ph2_uniter_seeds/train-hm-large-pa-1gpu-hpc_55555555.json +python train_hm.py --config config/ph2_uniter_seeds/train-hm-large-pa-1gpu-hpc_2147483647.json +``` + +*NOTE*: You should only care about the files `test_results_1140_rank0_final.csv`, the rest are just intermediate results. +Use the `notebooks/ph2_leaderboard.ipynb` to generate the final leaderboard results, but make sure to change the UNITER +output paths first. + +#### 4. Running inference to get predictions +You can download the [checkpoints](https://drive.google.com/file/d/18xuRFdlDNrUyXKA1AzENai7Z0h_rM3aF/view?usp=sharing) for all of the 12 models and then just run inference to get the predictions without training the models. +In order to run inference on the test set, you need to have two json files in place next to a checkpoint. +`/path/to/train_dir/log/model.json` and `/path/to/train_dir/log/hps.json`. These should match +the default pretrained UNITER-large `config/uniter-large.json` file and +your UNITER training file respectively, +e.g. `config/ph2_uniter_seeds/train-hm-large-pa-1gpu-hpc_0.json` for the model using seed 0. +You can simply copy and rename the two files to `model.json` and `hps.json`, but do keep in mind the different seed from each checkpoint. + +*NOTE*: If you don't download the checkpoints and prefer to train the model yourself, you don't +need to copy these files, since the training part will already copy them in the correct folder. + +The structure of the checkpoint and log folders should look like this: +```bash + ├── seed_0 + │   ├── ckpt + │   │   ├── model_step_1140.pt + │   ├── log + │   │   ├── model.json (*NOTE* content matches config/uniter-large.json) + │   │   ├── hps.json (*NOTE* content matches config/ph2_uniter_seeds/train-hm-large-pa-1gpu-hpc_0.json) + │   ├── ... + ├── seed_24 + │   ├── ckpt + │   │   ├── model_step_1140.pt + │   ├── log + │   │   ├── model.json (*NOTE* content matches config/uniter-large.json) + │   │   ├── hps.json (*NOTE* content matches config/ph2_uniter_seeds/train-hm-large-pa-1gpu-hpc_24.json) + ├── ... +``` +Run inference on the `test_unseen` set. +```bash +python inf_hm.py --root_path ./ --dataset_path /path/to/data \ + --test_image_set test --train_dir /path/to/train_dir \ + --ckpt 1140 --output_dir /path/to/output \ + --fp16 +``` + +## Citation +If you find this code useful for your research, please consider citing: + +``` +@article{sandulescu2020detecting, + title={Detecting Hateful Memes Using a Multimodal Deep Ensemble}, + author={Sandulescu, Vlad}, + journal={arXiv preprint arXiv:2012.13235}, + year={2020} +} +```