A non-local operation is a flexible building block and can be easily used together with convolutional/recurrent layers. It can be added into the earlier part of deep neural networks, unlike fc layers that are often used in the end. This allows us to build a richer hierarchy that combines both non-local and local information.

Table 1 baseline_ResNet50_C2D

Table 1 shows our C2D baseline under a ResNet-50 backbone.In this repositories, we use the Inflated 3D ConvNet(I3D) under a ResNet-50 backbone. One can turn the C2D model in Table 1into a 3D convolutional counterpart by “inflating” the kernels. For example, a 2D k×k kernel can be inflated as a 3D t×k×k kernel that spans t frames. And we add 5 blocks (3 to res4 and 2 to res3, to every other residual block). For more information, please read the paper.

Benchmark on Dataset

Dataset used: Kinetics400

Description: Kinetics-400 is a commonly used dataset for benchmarks in the video field. For details, please refer to its official website Kinetics. For the download method, please refer to the official address ActivityNet, and use the download script provided by it to download the dataset.
Dataset size:

category Number of data

Training set 238797

Validation set 19877

Because of the expirations of some YouTube links, the sizes of kinetics dataset copies may be different.

Dataset used in the paper Non-local Neural Networks:

Kinetics contains ∼246k training videos and 20k validation videos. It is a classification task involving 400 human action categories. They train all models on the training set and test on the validation set.

The directory structure of Kinetic-400 dataset looks like:

    .
    |-kinetic-400
        |-- train
        |   |-- ___qijXy2f0_000011_000021.mp4       // video file
        |   |-- ___dTOdxzXY_000022_000032.mp4       // video file
        |    ...
        |-- test
        |   |-- __Zh0xijkrw_000042_000052.mp4       // video file
        |   |-- __zVSUyXzd8_000070_000080.mp4       // video file
        |-- val
        |   |-- __wsytoYy3Q_000055_000065.mp4       // video file
        |   |-- __vzEs2wzdQ_000026_000036.mp4       // video file
        |    ...
        |-- kinetics-400_train.csv                  // training dataset label file.
        |-- kinetics-400_test.csv                   // testing dataset label file.
        |-- kinetics-400_val.csv                    // validation dataset label file.

        ...

Environment Requirements

To run the python scripts in the repository, you need to prepare the environment as follow:

Python and dependencies
- python 3.7.5
- decord 0.6.0
- mindspore-gpu 1.6.1
- ml-collections 0.1.1
- numpy 1.21.5
- Pillow 9.0.1
- PyYAML 6.0
Hardware
- Prepare hardware environment with GPU(Nvidia).
Framework
- MindSpore
For more information, please check the resources below:
- MindSpore Tutorials
- MindSpore Python API

Quick Start

Requirements Installation

Some packages in requirements.txt need Cython package to be installed first. For this reason, you should use the following commands to install dependencies:

pip install -r requirements.txt

Dataset Preparation

Nonlocal model uses kinetics400 dataset to train and validate in this repository.

Model Checkpoints

Our non-local model which migrated from the pretrain model for pytorch i3d_nl_dot_product_r50 is finetuned on the Kinetics400 dataset for 1 epochs. It can be downloaded here: [nonlocal_kinetics400_mindspore.ckpt]

Running

To train or finetune the model, you can run the following script:

cd scripts/

# run training example
bash train_standalone.sh [PROJECT_PATH] [DATA_PATH]

# run distributed training example
bash train_distribute.sh [PROJECT_PATH] [DATA_PATH]

To validate the model, you can run the following script:

cd scripts/

# run evaluation example
bash eval_standalone.sh [PROJECT_PATH] [DATA_PATH]

Script Description

Script and Sample Code

.
│  eval.py                                     // eval script
│  README.md                                    // descriptions about Nonlocal
│  train.py                                     // training script
└─scripts
    | eval_standalone.sh                        //eval standalone script
    | train_distribute.sh                       //train distribute script
    | train_standalone.sh                       //train standalone script
└─src
    ├─config
    │      nonlocal.yaml                        // Nonlocal parameter configuration
    ├─data
    │  │  builder.py                            // build data
    │  │  download.py                           // download dataset
    │  │  generator.py                          // generate video dataset
    │  │  images.py                             // process image
    │  │  kinetics400.py                        // kinetics400 dataset
    │  │  meta.py                               // public API for dataset
    │  │  path.py                               // IO path
    │  │  video_dataset.py                      // video dataset
    │  │
    │  └─transforms
    │          builder.py                       // build transforms
    │          video_center_crop.py             // center crop
    │          video_normalize.py               // normalize
    │          video_random_crop.py             // random crop
    │          video_random_horizontal_flip.py  // random horizontal flip
    │          video_reorder.py                 // reorder
    │          video_rescale.py                 // rescale
    │          video_short_edge_resize.py       // short edge resize
    │
    ├─example
    │      nonlocal_kinetics400_eval.py         // eval nonlocal model
    │      nonlocal_kinetics400_train.py        // train nonlocal model
    │
    ├─loss
    │      builder.py                           // build loss
    │
    ├─models
    │  │  builder.py                            // build model
    │  │  nonlocal3d.py                                // nonlocal model
    │  │
    │  └─layers
    │          adaptiveavgpool3d.py             // adaptive average pooling 3D.
    │          dropout_dense.py                 // dense head
    │          inflate_conv3d.py                // inflate conv3d block
    |          maxpool3d.py                     // 3D max pooling
    |          maxpool3dwithpad.py              // 3D max pooling with padding operation
    │          resnet3d.py                      // resnet backbone
    │          unit3d.py                        // unit3d module
    │
    ├─optim
    │      builder.py                           // build optimizer
    │
    ├─schedule
    │      builder.py                           // build learning rate shcedule
    │      lr_schedule.py                       // learning rate shcedule
    │
    └─utils
            callbacks.py                        // eval loss monitor
            check_param.py                      // check parameters
            class_factory.py                    // class register
            config.py                           // parameter configuration
            six_padding.py                      // convert padding list into tuple

Script Parameters

Parameters for both training and evaluation can be set in nonlocal.yaml

config for Nonlocal, Kinetics400 dataset

# ==============================================================================
# model architecture
model_name: "nonlocal"

# The dataset sink mode.
dataset_sink_mode: False

# Context settings.
context:
    mode: 0 #0--Graph Mode; 1--Pynative Mode
    device_target: "GPU"

# model settings of every parts
model:
    type: nonlocal3d
    in_d: 32
    in_h: 224
    in_w: 224
    num_classes: 400
    keep_prob: 0.5

# learning rate for training process
learning_rate:
    lr_scheduler: "cosine_annealing"
    lr: 0.0003
    lr_epochs: [2, 4]
    lr_gamma: 0.1
    eta_min: 0.0
    t_max: 100
    max_epoch: 5
    warmup_epochs: 1

# optimizer for training process
optimizer:
    type: 'SGD'
    momentum: 0.9
    weight_decay: 0.0001

loss:
    type: SoftmaxCrossEntropyWithLogits
    sparse: True
    reduction: "mean"

train:
    pre_trained: True
    pretrained_model: "./ms_nonlocal_dot_kinetics400_finetune.ckpt"
    ckpt_path: "./output/"
    epochs: 5
    save_checkpoint_epochs: 5
    save_checkpoint_steps: 4975
    keep_checkpoint_max: 10

eval:
    pretrained_model: "./nonlocal-1_4975.ckpt"

infer:
    pretrained_model: "./nonlocal-1_4975.ckpt"
    batch_size: 1
    image_path: ""
    normalize: True
    output_dir: "./infer_output"

# Kinetic400 dataset config
data_loader:
    train:
        dataset:
              type: Kinetic400
              path: "/data/kinetics-dataset"
              split: 'train'
              seq: 32
              seq_mode: 'interval'
              num_parallel_workers: 1
              shuffle: True
              batch_size: 6
              frame_interval: 6

        map:
            operations:
                - type: VideoShortEdgeResize
                  size: 256
                  interpolation: 'bicubic'
                - type: VideoRandomCrop
                  size: [224, 224]
                - type: VideoRandomHorizontalFlip
                  prob: 0.5
                - type: VideoRescale
                - type: VideoReOrder
                  order: [3, 0, 1, 2]
                - type: VideoNormalize
                  mean: [0.485, 0.456, 0.406]
                  std: [0.229, 0.224, 0.255]
            input_columns: ["video"]

    eval:
        dataset:
            type: Kinetic400
            path: "/data/kinetics-dataset"
            split: 'val'
            seq: 32
            seq_mode: 'interval'
            num_parallel_workers: 1
            shuffle: False
            batch_size: 1
            frame_interval: 6
        map:
            operations:
                - type: VideoShortEdgeResize
                  size: 256
                  interpolation: 'bicubic'
                - type: VideoCenterCrop
                  size: [256, 256]
                - type: VideoRescale
                - type: VideoReOrder
                  order: [3, 0, 1, 2]
                - type: VideoNormalize
                  mean: [0.485, 0.456, 0.406]
                  std: [0.229, 0.224, 0.255]               
            input_columns: ["video"]
    group_size: 1
# ==============================================================================

Training Process

train_distributed.log for Kinetics400

epoch: 1 step: 4975, loss is 0.44932037591934204
epoch: 1 step: 4975, loss is 0.3773573338985443
epoch: 1 step: 4975, loss is 0.19342052936553955
epoch: 1 step: 4975, loss is 0.5734817385673523
epoch: 1 step: 4975, loss is 0.09291025996208191
epoch: 1 step: 4975, loss is 0.5412027835845947
epoch: 1 step: 4975, loss is 0.08211661130189896
epoch: 1 step: 4975, loss is 0.9573349356651306
epoch time: 18000 s, per step time: 2064 ms
epoch time: 18000 s, per step time: 2063 ms
epoch time: 18000 s, per step time: 2064 ms
epoch time: 18000 s, per step time: 2064 ms
epoch time: 18001 s, per step time: 2065 ms
epoch time: 18001 s, per step time: 2065 ms
epoch time: 18001 s, per step time: 2065 ms
epoch time: 18002 s, per step time: 2066 ms
...

Evaluation Process

eval.log for Kinetics400

[Start eval `nonlocal`]
eval: 1/19877
eval: 2/19877
eval: 3/19877
eval: 4/19877
eval: 5/19877
eval: 6/19877
eval: 7/19877
eval: 8/19877
eval: 9/19877
eval: 10/19877
...
eval: 19874/19877
eval: 19875/19877
eval: 19876/19877
eval: 19877/19877
{'Top_1_Accuracy': 0.7248, 'Top_5_Accuracy': 0.9072}

Benchmark

Kinetics400 contains ∼246k training videos and 20k validation videos. It is a classification task involving 400 human action categories. We train the model on the training set and test on the validation set. Under the same setting conditions, we compared the accuracy of the models under the three frameworks.

_type	_{input frames}	_non-local?	_top1	_top5	_model
_{i3d_nlnet_origin_caffe}	32	Yes	74.90	91.60	`link`
_{i3d_nlnet_pytorch}	32	Yes	73.92	91.59	`link`
_{i3d_nlnet_mindspore}	32	Yes	72.48	90.72	`link`

Here is the accuracy of the model from source paper.

Figure 2 Accuracy from source paper

Visualization result

We have done some visualization of the classification results of the model. The following is a visual sample.

Visualization sample

Citation

@article{NonLocal2018,
    author = {Xiaolong Wang and Ross Girshick and Abhinav Gupta and Kaiming He},
    title = {Non-local Neural Networks},
    year = {2018},
    journal = {CVPR},
    doi = {10.1109/CVPR.2018.00813},
}

@misc{2020mmaction2,
    title={OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark},
    author={MMAction2 Contributors},
    howpublished = {\url{https://github.com/open-mmlab/mmaction2}},
    year={2020}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Contents

Description

Model Architecture

Benchmark on Dataset

Environment Requirements

Quick Start

Requirements Installation

Dataset Preparation

Model Checkpoints

Running

Script Description

Script and Sample Code

Script Parameters

Training Process

Evaluation Process

Benchmark

Visualization result

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
eval.py		eval.py
requirements.txt		requirements.txt
train.py		train.py

category	Number of data
Training set	238797
Validation set	19877

License

Aprilkaka/nonlocal_mindspore

Folders and files

Latest commit

History

Repository files navigation

Contents

About

Resources

License

Security policy

Stars

Watchers

Forks

Languages