-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
132 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,132 @@ | ||
# PKOL | ||
This is the code implementation of 'Video Question Answering with Prior Knowledgeand Object-sensitive Learning' | ||
## Video Question Answering with Prior Knowledge and Object-sensitive Learning (PKOL) | ||
|
||
[](https://www.python.org/)[](https://github.com/zchoi/S2-Transformer/blob/main/LICENSE)[](https://pytorch.org/) | ||
|
||
This is the official code implementation for the paper: | ||
|
||
**[Video Question Answering with Prior Knowledge and Object-sensitive Learning][1]** | ||
|
||
<p align="center"> | ||
<img src="C:\Users\pc\Desktop\1655541642596.jpg" alt="Relationship-Sensitive Transformer" width="850"/> | ||
</p> | ||
|
||
|
||
|
||
## Table of Contents | ||
|
||
- [Setups](#Setups) | ||
- [Data Preparation](#data-preparation) | ||
- [Training](#training) | ||
- [Evaluation](#evaluation) | ||
- [Reference and Citation](#reference-and-citation) | ||
- [Acknowledgements](#acknowledgements) | ||
|
||
## Setups | ||
|
||
- **Ubuntu** 20.04 | ||
- **CUDA** 11.5 | ||
- **Python** 3.7 | ||
- **PyTorch** 1.7.0 + cu110 | ||
|
||
1. Clone this repository: | ||
|
||
``` | ||
conda create -n hcrn_videoqa python=3.6 | ||
conda activate hcrn_videoqa | ||
conda install -c conda-forge ffmpeg | ||
conda install -c conda-forge scikit-video | ||
pip install -r requirements.txt | ||
``` | ||
|
||
2. Install dependencies: | ||
|
||
``` | ||
conda create -n vqa python=3.7 | ||
conda activate vqa | ||
pip install -r requirements.txt | ||
``` | ||
|
||
## Data Preparation | ||
|
||
- #### Text Features | ||
|
||
Download pre-extracted text features from [here](), and place it into `data/{dataset}-qa/` for MSVD-QA, MSRVTT-QA and `data/tgif-qa/{question_type}/` for TGIF-QA, respectively. | ||
|
||
- #### Visual Features | ||
|
||
Download pre-extracted visual features (i.e., appearance, motion, object) from [here](), and place it into `data/{dataset}-qa/` for MSVD-QA, MSRVTT-QA and `data/tgif-qa/{question_type}/` for TGIF-QA, respectively. | ||
|
||
> **Note:** The object features are huge, (especially ~700GB for TGIF-QA), please be cautious of disk space when downloading. | ||
|
||
|
||
## Experiments | ||
|
||
- ##### For MSVD-QA and MSRVTT-QA: | ||
|
||
<u>Training</u>: | ||
|
||
``` | ||
python train_iterative.py --cfg configs/msvd_qa.yml | ||
``` | ||
<u>Evaluation</u>: | ||
|
||
``` | ||
python validate_iterative.py --cfg configs/msvd_qa.yml | ||
``` | ||
|
||
- ##### For TGIF-QA: | ||
|
||
Choose a suitable config file in `configs/{task}.yml` for one of 4 tasks: `action, transition, count, frameqa` to train/val the model. For example, to train with action task, run the following command: | ||
|
||
<u>Training</u>: | ||
|
||
``` | ||
python train_iterative.py --cfg configs/tgif_qa_action.yml | ||
``` | ||
|
||
<u>Evaluation</u>: | ||
|
||
``` | ||
python validate_iterative.py --cfg configs/tgif_qa_action.yml | ||
``` | ||
|
||
|
||
## Results | ||
|
||
Performance on MSVD-QA and MSRVTT-QA datasets: | ||
|
||
| Model | MSVD-QA | MSRVTT-QA | | ||
|:---------- |:-------: |:-: | | ||
| PKOL | 41.1 | 36.9 | | ||
|
||
Performance on TGIF-QA dataset: | ||
|
||
| Model | Count ↓ | FrameQA ↑ | Trans. ↑ | Action ↑ | | ||
| :---- | :-----: | :-------: | :------: | :------: | | ||
| PKOL | 3.67 | 61.8 | 82.8 | 74.6 | | ||
|
||
|
||
|
||
## Reference and Citation | ||
|
||
### Reference | ||
[1] Le, Thao Minh, et al. "Hierarchical conditional relation networks for video question answering." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020. | ||
|
||
### Citation | ||
``` | ||
@inproceedings{PKOL, | ||
author = {Pengpeng Zeng and | ||
Haonan Zhang and | ||
Lianli Gao and | ||
Jingkuan Song and | ||
Heng Tao Shen | ||
}, | ||
title = {Video Question Answering with Prior Knowledge and Object-sensitive Learning}, | ||
booktitle = {TIP}, | ||
% pages = {????--????} | ||
year = {2022} | ||
} | ||
``` | ||
## Acknowledgements | ||
Our code implementation is also based on this [repo](https://github.com/thaolmk54/hcrn-videoqa). |