Skip to content

Commit

Permalink
Add models for OmniSource (#208)
Browse files Browse the repository at this point in the history
  • Loading branch information
kennymckormick authored Aug 24, 2020
1 parent 4aee54f commit c7e3b7c
Show file tree
Hide file tree
Showing 2 changed files with 35 additions and 23 deletions.
42 changes: 26 additions & 16 deletions MODEL_ZOO.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,19 @@

## Action Recognition

For action recognition, unless specified, models are trained on Kinetics-400. The version of Kinetics-400 we used contains 240436 training videos and 19796 testing videos. For TSN, we also train it on UCF-101, initialized with ImageNet pretrained weights. We also provide transfer learning results on UCF101 and HMDB51 for some algorithms. Models with * are converted from other repos(including [VMZ](https://github.com/facebookresearch/VMZ) and [kinetics_i3d](https://github.com/deepmind/kinetics-i3d)), others are trained by ourselves. If you can not reproduce our testing results due to dataset unalignment, please submit a request at [get validation data](https://forms.gle/jmBiCDJButrLwpgc9).
For action recognition, unless specified, models are trained on Kinetics-400. The version of Kinetics-400 we used contains 240436 training videos and 19796 testing videos. For TSN, we also train it on UCF-101, initialized with ImageNet pretrained weights. We also provide transfer learning results on UCF101 and HMDB51 for some algorithms. Models with * are converted from other repos(including [VMZ](https://github.com/facebookresearch/VMZ) and [kinetics_i3d](https://github.com/deepmind/kinetics-i3d)), others are trained by ourselves.

For data preprocessing, we find that resizing short-edges of videos to 256px is generally a better choice than resizing the video to fixed width and height 340x256, since the size ratios are kept. Most of our Kinetics-400 models are trained with videos which short-edges are resized to 256px. However, some legacy Kinetics-400 models are trained with videos with fixed width and height (340x256). We use the mark $^{340\times256}$ to indicate the model is legacy.

If you can not reproduce our testing results due to dataset unalignment, please submit a request at [get validation data](https://forms.gle/jmBiCDJButrLwpgc9).

### TSN

#### Kinetics

| Modality | Pretrained | Backbone | Input | Top-1 | Top-5 | Download |
| :------: | :--------: | :---------: | :--------: | :------------------------------------: | :------------------------------------: | -------------------------------------- |
| RGB | ImageNet | ResNet50 | 3seg | 70.6 | 89.4 | [model](https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/kinetics400/tsn2d_kinetics400_rgb_r50_seg3_f1s1-b702e12f.pth) |
| :------: | :--------: | :---------: | :--------: | :------------------------------------: | :------------------------------------: | :------------------------------------: |
| RGB | ImageNet | ResNet50 | 3seg | 70.6 | 89.4 | [model](https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/kinetics400/tsn2d_kinetics400_rgb_r50_seg3_f1s1-b702e12f.pth)$^{340\times256}$ |


#### UCF101
Expand Down Expand Up @@ -44,7 +48,7 @@ For action recognition, unless specified, models are trained on Kinetics-400. Th
| Modality | Pretrained | Backbone | Input | Top-1 | Top-5 | Download |
| :--------: | :--------: | :----------: | :---: | :---: | :---: | :----------------------------------------------------------: |
| RGB | ImageNet | Inception-V1 | 64x1 | 71.1 | 89.3 | [model](https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/kinetics400/i3d_kinetics400_se_rgb_inception_v1_seg1_f64s1_imagenet_deepmind-9b8e02b3.pth)* |
| RGB | ImageNet | ResNet50 | 32x2 | 72.9 | 90.8 | [model](https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/kinetics400/i3d_kinetics_rgb_r50_c3d_inflated3x1x1_seg1_f32s2_f32s2-b93cc877.pth) |
| RGB | ImageNet | ResNet50 | 32x2 | 72.9 | 90.8 | [model](https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/kinetics400/i3d_kinetics_rgb_r50_c3d_inflated3x1x1_seg1_f32s2_f32s2-b93cc877.pth)$^{340\times256}$ |
| Flow | ImageNet | Inception-V1 | 64x1 | 63.4 | 84.9 | [model](https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/kinetics400/i3d_kinetics_flow_inception_v1_seg1_f64s1_imagenet_deepmind-92059771.pth)* |
| Two-Stream | ImageNet | Inception-V1 | 64x1 | 74.2 | 91.3 | / |

Expand All @@ -70,29 +74,35 @@ For action recognition, unless specified, models are trained on Kinetics-400. Th
| RGB | ImageNet | ResNet50 | 4x16 | 75.9 | 92.3 | [model](https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/kinetics400/slowfast_kinetics400_se_rgb_r50_4x16_finetune-4623cf03.pth) |

### R(2+1)D
| Modality | Pretrained | Backbone | Input | Top-1 | Top-5 | Download |
| :------: | :--------: | :------: | :---: | :---: | :---: | :----------------------------------------------------------: |
| RGB | None | ResNet34 | 8x8 | 63.7 | 85.9 | [model](https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/kinetics400/r2plus1d_kinetics400_se_rgb_r34_f8s8_scratch-1f576444.pth) |
| RGB | IG-65M | ResNet34 | 8x8 | 74.4 | 91.7 | [model](https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/kinetics400/r2plus1d_kinetics400_se_rgb_r34_f8s8_finetune-c3abbbfc.pth) |
| RGB | None | ResNet34 | 32x2 | 71.8 | 90.4 | [model](https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/kinetics400/r2plus1d_kinetics400_se_rgb_r34_f32s2_scratch-97f56158.pth) |
| RGB | IG-65M | ResNet34 | 32x2 | 80.3 | 94.7 | [model](https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/kinetics400/r2plus1d_kinetics400_se_rgb_r34_f32s2_finetune-9baa39ea.pth) |
| Modality | Pretrained | Backbone | Input | Top-1 | Top-5 | Download |
|:--------:|:----------:|:--------:|:-----:|:-----:|:-----:|:---------------------------------------------------------------------------------------------------------------------------------------------------:|
| RGB | None | ResNet34 | 8x8 | 63.7 | 85.9 | [model](https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/kinetics400/r2plus1d_kinetics400_se_rgb_r34_f8s8_scratch-1f576444.pth) |
| RGB | IG-65M | ResNet34 | 8x8 | 74.4 | 91.7 | [model](https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/kinetics400/r2plus1d_kinetics400_se_rgb_r34_f8s8_finetune-c3abbbfc.pth) |
| RGB | None | ResNet34 | 32x2 | 71.8 | 90.4 | [model](https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/kinetics400/r2plus1d_kinetics400_se_rgb_r34_f32s2_scratch-97f56158.pth) |
| RGB | IG-65M | ResNet34 | 32x2 | 80.3 | 94.7 | [model](https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/kinetics400/r2plus1d_kinetics400_se_rgb_r34_f32s2_finetune-9baa39ea.pth) |

### CSN
| Modality | Pretrained | Backbone | Input | Top-1 | Top-5 | Download |
| :------: | :--------: | :-------: | :---: | :---: | :---: | :----------------------------------------------------------: |
| RGB | IG-65M | irCSN-152 | 32x2 | 82.6 | 95.7 | [model](https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/kinetics400/ircsn_kinetics400_se_rgb_r152_f32s2_ig65m_fbai-9d6ed879.pth)* |
| RGB | IG-65M | ipCSN-152 | 32x2 | 82.7 | 95.6 | [model](https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/kinetics400/ipcsn_kinetics400_se_rgb_r152_f32s2_ig65m_fbai-ef39b9e3.pth)* |

* Converted from [VMZ in Caffe2](https://github.com/facebookresearch/VMZ).
### OmniSource

| Modality | Pretrained | Backbone | Input | Top-1 (Baseline / OmniSource ($\Delta$)) | Top-5 (Baseline / OmniSource ($\Delta$)) | Download |
| :------: | :--------: | :-------: | :---: | :--------------------------------------: | :--------------------------------------: | :----------------------------------------------------------: |
| RGB | ImageNet | ResNet50 | 3seg | 70.6 / 73.6 (+ 3.0) | 89.4 / 91.0 (+ 1.6) | [Baseline](https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/kinetics400/tsn2d_kinetics400_rgb_r50_seg3_f1s1-b702e12f.pth)$^{340\times256}$ / [OmniSource](https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/kinetics400/omnisource/tsn_OmniSource_kinetics400_se_rgb_r50_seg3_f1s1_imagenet-4066cb7e.pth)$^{340\times256}$ |
| RGB | IG-1B | ResNet50 | 3seg | 73.1 / 75.7 (+ 2.6) | 90.4 / 91.9 (+ 1.5) | [Baseline](https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/kinetics400/tsn_kinetics400_se_rgb_r50_seg3_f1s1_IG1B-d4bc58ba.pth) / [OmniSource](https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/kinetics400/omnisource/tsn_OmniSource_kinetics400_se_rgb_r50_seg3_f1s1_IG1B-25fc136b.pth) |
| RGB | Scratch | ResNet50 | 4x16 | 72.9 / 76.8 (+ 3.9) | 90.9 / 92.5 (+ 1.6) | [Baseline](https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/kinetics400/slowonly_kinetics400_se_rgb_r50_seg1_4x16_scratch_epoch256-594abd88.pth) / [OmniSource](https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/kinetics400/omnisource/slowonly_OmniSource_kinetics400_se_rgb_r50_seg1_4x16_scratch-71f7b8ee.pth) |
| RGB | Scratch | ResNet101 | 8x8 | 76.5 / 80.4 (+ 3.9) | 92.7 / 94.4 (+ 1.7) | [Baseline](https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/kinetics400/slowonly_kinetics400_se_rgb_r101_8x8_scratch-8de47237.pth) / [OmniSource](https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/kinetics400/omnisource/slowonly_OmniSource_kinetics400_se_rgb_r101_seg1_8x8_scratch-2f838cb0.pth) |

### Transfer Learning

| Model | Modality | Pretrained | Backbone | Input | UCF101 | HMDB51 | Download (split1) |
| ----- | :-------: | :--------: | :------: | :---: | :----: | :----: | :----------------------------------------------------------: |
| I3D | RGB | Kinetics | I3D | 64x1 | 94.8 | 72.6 | [UCF101](https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/ucf101/i3d_ucf101_split1_rgb_f64s1_kinetics400ft-36201298.pth) / [HMDB51](https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/ucf101/i3d_hmdb51_split1_rgb_f64s1_kinetics400ft-1ffcf11f.pth) |
| I3D | Flow | Kinetics | I3D | 64x1 | 96.6 | 79.2 | [UCF101](https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/ucf101/i3d_ucf101_split1_flow_f64s1_kinetics400ft-93ed9ecd.pth) / [HMDB51](https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/ucf101/i3d_hmdb51_split1_flow_f64s1_kinetics400ft-2981c797.pth) |
| I3D | TwoStream | Kinetics | I3D | 64x1 | 97.8 | 80.8 | / |
| Model | Modality | Pretrained | Backbone | Input | UCF101 | HMDB51 | Download (split1) |
|-------|:---------:|:----------:|:--------:|:-----:|:------:|:------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
| I3D | RGB | Kinetics | I3D | 64x1 | 94.8 | 72.6 | [UCF101](https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/ucf101/i3d_ucf101_split1_rgb_f64s1_kinetics400ft-36201298.pth) / [HMDB51](https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/ucf101/i3d_hmdb51_split1_rgb_f64s1_kinetics400ft-1ffcf11f.pth) |
| I3D | Flow | Kinetics | I3D | 64x1 | 96.6 | 79.2 | [UCF101](https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/ucf101/i3d_ucf101_split1_flow_f64s1_kinetics400ft-93ed9ecd.pth) / [HMDB51](https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/ucf101/i3d_hmdb51_split1_flow_f64s1_kinetics400ft-2981c797.pth) |
| I3D | TwoStream | Kinetics | I3D | 64x1 | 97.8 | 80.8 | / |

## Action Detection

Expand Down
16 changes: 9 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ It is a part of the [open-mmlab](https://github.com/open-mmlab) project develope

- action recognition from trimmed videos
- temporal action detection (also known as action localization) in untrimmed videos
- spatial-temporal action detection in untrimmed videos.
- spatial-temporal action detection in untrimmed videos.


- Support for various datasets
Expand All @@ -22,27 +22,29 @@ It is a part of the [open-mmlab](https://github.com/open-mmlab) project develope
MMAction implements popular frameworks for action understanding:

- For action recognition, various algorithms are implemented, including TSN, I3D, SlowFast, R(2+1)D, CSN.
- For temporal action detection, we implement SSN.
- For temporal action detection, we implement SSN.
- For spatial temporal atomic action detection, a Fast-RCNN baseline is provided.

- Modular design

The tasks in human action understanding share some common aspects such as backbones, and long-term and short-term sampling schemes.
Also, tasks can benefit from each other. For example, a better backbone for action recognition will bring performance gain for action detection.
Also, tasks can benefit from each other. For example, a better backbone for action recognition will bring performance gain for action detection.
Modular design enables us to view action understanding in a more integrated perspective.

## License
The project is release under the [Apache 2.0 license](https://github.com/open-mmlab/mmaction/blob/master/LICENSE).

## Updates

v0.1.0 (19/06/2019)
- MMAction is online!
[OmniSource](https://arxiv.org/abs/2003.13042) Model Release (22/08/2020)
- We release several models of our work [OmniSource](https://arxiv.org/abs/2003.13042). These models are jointly trained with
Kinetics-400 and OmniSourced web dataset. Those models are of good performance (Top1 Accuracy: **75.7%** for 3-segment TSN and **80.4%** for SlowOnly on Kinetics-400 val) and the learned representation transfer well to other tasks.

v0.2.0 (15/03/2020)

- We build a diversified modelzoo for action recognition, which include popular algorithms (TSN, I3D, SlowFast, R(2+1)D, CSN). The performance is aligned with or better than the original papers.

v0.1.0 (19/06/2019)
- MMAction is online!

## Model zoo
Results and reference models are available in the [model zoo](https://github.com/open-mmlab/mmaction/blob/master/MODEL_ZOO.md).

Expand Down

0 comments on commit c7e3b7c

Please sign in to comment.