Skip to content

Files

Latest commit

1f12c16 · Apr 13, 2020

History

History
53 lines (42 loc) · 2.92 KB

README.md

File metadata and controls

53 lines (42 loc) · 2.92 KB

LPCVC-2020 Sample Solution

Overview

This is the sample solution for Low Power Computer Vision Challenge (LPCVC) 2020 Video Track. This solution serves only as the baseline solution and a lot of improvements can be made on top of this to further optimize the performance of the solution.

The proposed solution is made up of 3 blocks. The first block (sampling block) takes in a video file and determine which frames are worth doing detection and recognition on. This sample solution does so by dissecting the motion vector from the H.264 encoding of the video to pick out stationary i-frames. The second block (detection block) does word detection on the frames selected from the sampling block. This sample solution uses EAST Detector. Lastly, the third block (recognition block) does optical character recognition (OCR) on the cropped words. The sample solution provides two choices: Connectionist Temporal Classification (CTC) or Attention OCR.

Contents

  1. Setup
  2. Usage
  3. Notes

Setup

  1. Clone code from master branch.
git clone https://github.com/tanliyon/lpcvc-2020.git
  1. Download model file for all EAST-Detector, CTC and Attention OCR.
    EAST-Detector
    CTC
    Attention-Encoder
    Attention-Decoder

  2. Install dependencies.
    pip install -r requirements.txt
    Note that lanms might not work with Windows.

  3. Check directory structure. It should be:
    lpcvc-2020
    |_wrapper.py
    |_detector.pth
    |_ctc.pth
    |_encoder.pth
    |_decoder.pth
    |_(all other folders pulled from master)

Usage

The call syntax is:

python main.py video_file_path.mp4 question_file_path.txt

To toggle between the two recognition option, you can toggle the USE_ATTN_OCR flag in main.py. The SHOW_BOXES flag controls if the detection output should be saved in a folder and the SHOW_TEXT flag controls if the recognition prediction should be printed in stdout.

Notes

  1. Currently, the solution takes a long time because of the number of frames it run inference on. If you want to test only a portion of it, run the code for a set amount of time, then comment out the line frames_list = iFRAMES(video_path) in wrapper.py. Then run the code again.

References

  1. Low Power Computer Vision Challenge (LPCVC) 2020 Video Track
  2. EAST Detector
  3. Connectionist Temporal Classification (CTC)
  4. Attention OCR