sudo apt-get install git-lfs
pip install -r requirements.txt
cd dataset
python download.py
python3 train.py --seed 59 --is_text_augment False --is_img_augment False --use_dynamic_thresh False
python3 train.py --seed 60 --dataset_name openvivqa --is_text_augment True --n_text_paras 1 --n_text_para_pool 30 --text_para_thresh 0.6 --is_img_augment False
python3 train.py --seed 59 --is_text_augment True --n_text_paras 1 --n_text_para_pool 30 --text_para_thresh 0.6 --is_img_augment False --use_dynamic_thresh False
python3 train.py --seed 59 --is_img_augment True --n_img_augments 1 --img_augment_thresh 0.2 --is_text_augment False --use_dynamic_thresh False
python3 train.py --seed 59 --use_dynamic_thresh True --is_text_augment True --n_text_paras 1 --n_text_para_pool 30 --n_text_para_pool 30 --text_para_thresh 0.6 --is_img_augment False
This repo contains the survey, source and re-implementation code of several methods related to the Visual Question Answering task.
- Skeleton source
- [] Benchmarks
- CNN+LSTM Classifier
- CNN+LSTM+Attention Classifier
- Transformers Encoder-Decoder
- CNN+BERT Classifier
- ViT+LSTM Classifier
- BERT+ViT Classifier
- BLIP Finetuning
- PhoBERT+ViT Classifier
- BARTPpho+ViT Classifier
- BARTpho+BEiT Classifier
Dataset | Description | Size |
---|---|---|
ViTextVQA | Vietnamese Text Comprehension in Images | 16.762 images, 50.342 QA pairs |
EVJVQA | Multilingual Visual Question Answering | 4.879 images, 33.790 QA pairs |
ViOCRVQA | Vietnamese Optical Character Recognition VQA | 28.282 images, 123.781 QA pairs |
OpenViVQA | Open-domain Vietnamese Visual Question Answering | 11.199 images, 37.914 QA pairs |
ViCLEVR | Vietnamese Visual Reasoning | 26.216 images, 30.000 QA pairs |
ViVQA | Vietnamese Visual Question Answering | 10.328 images, 15.000 QA pairs |