Skip to content

Commit

Permalink
Complete README
Browse files Browse the repository at this point in the history
  • Loading branch information
felixchenfy committed Jan 31, 2019
1 parent 15ef2a2 commit a561836
Show file tree
Hide file tree
Showing 11 changed files with 161 additions and 70 deletions.
5 changes: 2 additions & 3 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
cmake_minimum_required( VERSION 2.8 )
project ( my_slam )


# compile settings
set( CMAKE_CXX_COMPILER "g++" )
set( CMAKE_BUILD_TYPE "Debug" )
Expand All @@ -20,7 +19,7 @@ if(CCACHE_FOUND)
endif(CCACHE_FOUND)
set (CMAKE_CXX_FLAGS "-DPCL_ONLY_CORE_POINT_TYPES=ON -DNO_EXPLICIT_INSTANTIATIONS")

############### dependencies ######################
############### Dependencies ######################

# Eigen
include_directories( "/usr/include/eigen3" )
Expand Down Expand Up @@ -50,7 +49,7 @@ set( THIRD_PARTY_LIBS
${CSPARSE_LIBRARY}
)

############### dependencies ######################
############### My Files ###############
include_directories( ${PROJECT_SOURCE_DIR}/include )
add_subdirectory( src )
add_subdirectory( src_main )
Expand Down
173 changes: 133 additions & 40 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,9 @@ Monocular Visual Odometry

**Content:** A simple **Monocular Visual Odometry** (VO) with initialization, tracking, local map, and optimization on single frame.

**Video demo**: http://feiyuchen.com/wp-content/uploads/vo_with_opti.mp4
On the left, **white line** is the estimated camera trajectory, **green line** is ground truth. On the right, **green** are keypoints, and **red** are inlier matches with map points.
**Video demo**: http://feiyuchen.com/wp-content/uploads/vo_with_opti.mp4
On the left, **white line** is the estimated camera trajectory, **green line** is ground truth.
On the right, **green** are keypoints, **red** are inlier matches with map points.

![](https://github.com/felixchenfy/Monocular-Visual-Odometry-Data/raw/master/result/vo_with_opti.gif)

Expand All @@ -23,101 +24,193 @@ On the left, **white line** is the estimated camera trajectory, **green line** i
- [1.3. Optimization](#13-optimization)
- [1.4. Local Map](#14-local-map)
- [1.5. Other details](#15-other-details)
- [2. Software Architecture](#2-software-architecture)
- [3. How to Run](#3-how-to-run)
- [4. Result](#4-result)
- [5. Reference](#5-reference)
- [6. To Do](#6-to-do)
- [2. File Structure](#2-file-structure)
- [2.1. Folders](#21-folders)
- [2.2. Functions](#22-functions)
- [3. Dependencies](#3-dependencies)
- [4. How to Run](#4-how-to-run)
- [5. Results](#5-results)
- [6. Reference](#6-reference)
- [7. To Do](#7-to-do)

<!-- /TOC -->




# 1. Algorithm
This VO is achieved by the following procedures/algorithms:

## 1.1. Initialization

**When to initialize**:
Given a video, set the 1st frame(image) as reference, and do feature matching with the 2nd frame. If the average displacement in pixel between inlier matched keypoints exceeds a threshold, the initialization will be started. Otherwise, skip to the 3rd, 4th, etc frames until the criteria satisfies at the K_th frame. Then we estimate the relative camera pose between the 1st and K_th frame.
Given a video, set the 1st frame(image) as reference, and do feature matching with the 2nd frame. If the average displacement in pixel between inlier matched keypoints exceeds a threshold, the initialization will be started. Otherwise, skip to the 3rd, 4th, etc frames until the criteria satisfies at the K_th frame. Then estimate the relative camera pose between the 1st and K_th frame.

**Estimate relative camera pose**:
Compute the **Essential Matrix** (E) and **Homography Matrix** (H) between the two frames. Compute their **Symmetric Transfer Error** by method in [ORB-SLAM paper](https://arxiv.org/abs/1502.00956) and choose the better one (i.e., choose H if H/(E+H)>0.45). **Decompose E or H** into the relative pose of rotation (R) and translation (t). By using OpenCV, E gives 1 result, and H gives 2 results, satisfying the criteria that points are in front of camera. For E, only one result to choose; For H, choose the one that makes the image plane and world-points plane more parallel.
Compute the **Essential Matrix** (E) and **Homography Matrix** (H) between the two frames. Compute their **Symmetric Transfer Error** by method in [ORB-SLAM paper](https://arxiv.org/abs/1502.00956) and choose the better one (i.e., choose H if H/(E+H)>0.45). **Decompose E or H** into the relative pose of rotation (R) and translation (t). By using OpenCV, E gives 1 result, and H gives 2 results, satisfying the criteria that points are in front of camera. For E, only single result to choose; For H, choose the one that makes the image plane and world-points plane more parallel.

**Recover scale**:
Scale the translation t to be either: (1) Features points have average depth of 1m. Or (2) make it same scale as the corresponding groundth data so that I can draw and compare.

**Keyframe and local map**:
Insert both 1st and K_th frame as **keyframe**. **Triangulate** their inlier points to obtain the points' world positions. These points are called **map points** and are pushed to **local map**.
Insert both 1st and K_th frame as **keyframe**. **Triangulate** their inlier matched keypoints to obtain the points' world positions. These points are called **map points** and are pushed to **local map**.

## 1.2. Tracking

Keep on estimating the next camera pose. First, find map points that are in the camera view. Do feature matching to find 2d-3d correspondance between 3d map points and 2d image keypoints. Estimate camera pose by RANSAC and PnP.

## 1.3. Optimization

Apply optimization to this single frame : Using the inlier 3d-2d corresponding from PnP, we can compute the sum of reprojection error of each point pair to form the cost function. By computing the deriviate wrt (1) points 3d pos and (2) camera pose, we can solve the optimization problem using Gauss-Newton Method and its variants. These are done by **g2o** and its built-in datatypes of "VertexSBAPointXYZ", "VertexSE3Expmap", and "EdgeProjectXYZ2UV". See Slambook Chapter 4 and Chapter 7.8.2 for more details.
Apply optimization to this single frame : Using the inlier 3d-2d corresponding from PnP, we can compute the sum of reprojection error of each point pair to form the cost function. By computing the deriviate wrt (1) points 3d pos and (2) camera pose, we can solve the optimization problem using Gauss-Newton Method and its variants. These are done by **g2o** and its built-in datatypes of `VertexSBAPointXYZ`, `VertexSE3Expmap`, and `EdgeProjectXYZ2UV`. See Slambook Chapter 4 and Chapter 7.8.2 for more details.

Then the camera pose and inlier points' 3d pos are updated, at a level of about 0.0001 meter. (Though my final result shows that this optimization doesn't make much difference. Maybe I need to optimize more frames and keypoints)
Then the camera pose and inlier points' 3d pos are updated, at a level of about 0.0001 meter. (I found that that this optimization doesn't make much difference compared to the one without it. I need to make improvement by optimizing multiple frames at the same time.)

(TODO: Apply optimization to multiple frames, and then I call make it a real bundle adjustment.)
(TODO: Apply optimization to multiple frames, so that I can call this process bundle adjustment.)

## 1.4. Local Map

**Insert keyframe:** If the relative pose between current frame and previous keyframe is large enough, with a translation or rotation larger than the threshold, insert current frame as a keyframe. Triangulate 3d points and push to local map.
**Insert keyframe:** If the relative pose between current frame and previous keyframe is large enough with a translation or rotation larger than the threshold, insert current frame as a keyframe. Triangulate 3d points and push to local map.

**Clean up local map:** Remove map points that are: (1) not in current view, (2) whose view_angle is larger than threshold, (3) rarely be matched as inlier point. (See Slambook Chapter 9.4.)

## 1.5. Other details

**Image features**:
Extract ORB keypoints and features. Then, a simple grid sampling on keypoint's pixel pos is applied to retain uniform keypoints.
Extract ORB keypoints and features. Then, a simple grid sampling on keypoint's pixel pos is applied to retain uniform keypoints.
(Notes: The ORB-SLAM paper says that they do grid sampling in all pyramids, and extract more keypoints if somewhere has few points.)


**Feature matching**:
Two methods are implemented, where good match is:
(1) Feature's distance is smaller than threshold, described in Slambook.
(2) Ratio of smallest and second smallest distance is smaller than threshold, proposed in Prof. Lowe's 2004 SIFT paper.
The first one is adopted, which generates fewer error matches.
The first one is adopted, which is easier to tune the parameters to generate fewer error matches.
(Notes: ORB-SLAM paper is doing guided search for finding matches)

# 2. File Structure
## 2.1. Folders
* [include/](include/): c++ header files.
* [src/](src/): c++ definitions.
* [src_main/](src_main/): Main script to run VO.
* [test/](test/): Test scripts for c++ functions.

Main scripts and classes for VO are in [include/my_slam/](include/my_slam/). I referenced the [Slambook Chapter 9](https://github.com/gaoxiang12/slambook/tree/master/project/0.4) for setting this up.

## 2.2. Functions
Functions are declared in [include/](include/). Some of its folders contain a README. See the tree structure for overview:

```
include
├── my_basics
│   ├── basics.h
│   ├── config.h
│   ├── eigen_funcs.h
│   ├── io.h
│   ├── opencv_funcs.h
│   └── README.md
├── my_display
│   ├── pcl_display.h
│   └── pcl_display_lib.h
├── my_geometry
│   ├── camera.h
│   ├── common_include.h
│   ├── epipolar_geometry.h
│   ├── feature_match.h
│   └── motion_estimation.h
├── my_optimization
│   └── g2o_ba.h
└── my_slam
├── common_include.h
├── commons.h
├── frame.h
├── map.h
├── mappoint.h
├── README.md
└── vo.h
```
# 3. Dependencies
Require: OpenCV, Eigen, Sophus, g2o.
See details below:

**(1) OpenCV 4.0**
Tutorial for install OpenCV 4.0: [link](https://www.pyimagesearch.com/2018/08/15/how-to-install-opencv-4-on-ubuntu/).

You may need a version newer than 3.4.5, because I used this function:
`filterHomographyDecompByVisibleRefpoints`, which appears in OpenCV 3.4.5.

**(2) Eigen 3**
It's about matrix arithmetic. See its [official page]( http://eigen.tuxfamily.org/index.php?title=Main_Page). Install by:
> $ sudo apt-get install libeigen3-dev
(Note: Eigen only has header files. No ".so" or ".a".)


# 2. Software Architecture
TODO
**(3) Sophus**
It's based on Eigen, and contains data type of SE3/SO3/se3/so3.

# 3. How to Run
$ mkdir build && mkdir lib && mkdir bin
$ cd build && cmake .. && make && cd ..
$ bin/run_vo config/config.yaml
Download here: https://github.com/strasdat/Sophus. Do cmake and make. Since I failed to make install it, I manually moved “/Sophus/sophus” to “/usr/include/sophus”, and moved “libSophus.so” to “usr/lib”. Then, in my CMakeLists.txt, I do `set (THIRD_PARTY_LIBS libSophus.so )`.

# 4. Result
TODO
**(4) g2o**
Download here: https://github.com/RainerKuemmerle/g2o. Checkout to the last version in year 2017. Do cmake, make, make install.

# 5. Reference
If the csparse library is not found during cmake, please install the following package:
> $ sudo apt-get install libsuitesparse
**(1) Slambook**
I read this Dr. Xiang Gao's [slambook](https://github.com/gaoxiang12/slambook) before writing code. The book provides both vSLAM theory as well as easy-to-read code examples in every chapter.
# 4. How to Run
> $ mkdir build && mkdir lib && mkdir bin
> $ cd build && cmake .. && make && cd ..
The framework of my program is based on Chapter 9 of Slambook, which is a RGB-D visual odometry project. Classes declared in [include/my_slam/](include/my_slam/) are using its structure.
Then, set up things in [config/config.yaml](config/config.yaml), and run:
> $ bin/run_vo config/config.yaml
# 5. Results

I tested the current implementation on [TUM](https://vision.in.tum.de/data/datasets/rgbd-dataset/download) fr1_desk and fr1_xyz dataset, but both performances are **bad**. I guess one of the **cause** is that high quality keypoints are too few, so the feature matching returns few matches. The **solution** I guess is to use the ORB-SLAM's method for extracting enough uniformly destributed keypoints, and doing guided matching based on the estimated camera motion.

However, my program does work on this [New Tsukuba Stereo Database](http://cvlab.cs.tsukuba.ac.jp/), whose images and scenes are synthetic and have abundant high quality keypoints. Though large error still exists, the VO could roughly estimated the camera motion.
See the gif **at the beginning of this README**.

I also put two video links here which I recorded on my computer of running this VO program:
[1. VO video, with optimization on single frame](
https://github.com/felixchenfy/Monocular-Visual-Odometry-Data/blob/master/result/vo_with_opti.mp4)
[2. VO video, no optimization](https://github.com/felixchenfy/Monocular-Visual-Odometry-Data/blob/master/result/vo_no_opti.mp4)
The sad thing is, with or without this optimization on single frame, the result is about the same.


# 6. Reference

**(1) Slambook**:
I read this Dr. Xiang Gao's [Slambook](https://github.com/gaoxiang12/slambook) before writing code. The book provides both vSLAM theory as well as easy-to-read code examples in every chapter.

The framework of my program is based on Chapter 9 of Slambook, which is a RGB-D visual odometry project. Classes declared in [include/my_slam/](include/my_slam/) are based on this Chapter.

These files are mainly copied from Slambook and then modified:
* CMakeLists.txt
* [include/my_basics/config.h](include/my_basics/config.h) and its .cpp.
* [include/my_optimization/g2o_ba.h](include/my_optimization/g2o_ba.h) and its .cpp.
* [include/my_basics/config.h](include/my_basics/config.h).
* [include/my_optimization/g2o_ba.h](include/my_optimization/g2o_ba.h).

I also borrowed other codes from the slambook. But since they are small pieces and lines, I didn't list them here.

In short, the Slambook provides huge help for me and my this project.

**(2) ORB-SLAM paper**

Slambook doesn't write a lot about monocular visual odometry, so I resorted to this paper for help. Besides learning import ideas for vSLAM, I borrowed its code of **the criteria for choosing Essential or Homography** for VO initialization.
**(2) Matlab VO tutorial**:
[This](https://www.mathworks.com/help/vision/examples/monocular-visual-odometry.html?searchHighlight=visual%20odometry&s_tid=doc_srchtitle) is a matlab tutorial of monocular visual odometry. Since Slambook doesn't write a lot about monocular VO, I resorted to this Matlab tutorial for solution. It helped me a lot for getting clear the whole workflow.

The dataset I used is also the same as this Matlab tutorial, which is the [New Tsukuba Stereo Database](http://cvlab.cs.tsukuba.ac.jp/).

**(3) ORB-SLAM paper**

I borrowed its code of **the criteria for choosing Essential or Homography** for VO initialization. See the functions of `checkEssentialScore` and `checkHomographyScore` in [motion_estimation.h](include/my_geometry/motion_estimation.h).


For my next stage, I will read more of this paper's code in order to keep on improving my project.

# 7. To Do

# 6. To Do
**Bugs**
* In release mode, the program throws an error:
> *** stack smashing detected ***: <unknown> terminated
Please run in debug mode.

* Bugs:
In release mode, the program throws an error:
> *** stack smashing detected ***: <unknown> terminated
Please run in debug mode.
**Improvements**
* Build up the connections keypoints and frames. Then, add bundle adjustment.

11 changes: 10 additions & 1 deletion include/my_basics/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,13 @@

I use **namespace my** for this folder
Basic functions.

config.h:
For reading key-value pairs from .yaml file.

opencv_funcs.h:
Simple operations on image accessing, datatype conversion, and math operations, etc.

eigen_funcs.h:
Several functions for datatype conversion between eigen, opencv, and sophus.


4 changes: 1 addition & 3 deletions include/my_geometry/motion_estimation.h
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,6 @@ void helperEvalEppiAndTriangErrors(
bool print_res);

// Estimate camera motion by Essential matrix.
// This utility is part of the helperEstimatePossibleRelativePosesByEpipolarGeometry
void helperEstiMotionByEssential(
const vector<KeyPoint> &keypoints_1,
const vector<KeyPoint> &keypoints_2,
Expand All @@ -66,10 +65,9 @@ void helperTriangulatePoints(
vector<Point3f> &pts_3d_in_curr
);


// Compute the score of estiamted E/H matrix by the method in ORB-SLAM
double checkEssentialScore(const Mat &E21, const Mat &K, const vector<Point2f> &pts_img1, const vector<Point2f> &pts_img2,
vector<int> &inliers_index, double sigma=1.0);

double checkHomographyScore(const Mat &H21,const vector<Point2f> &pts_img1, const vector<Point2f> &pts_img2,
vector<int> &inliers_index, double sigma=1.0);

Expand Down
6 changes: 6 additions & 0 deletions include/my_slam/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@

The folder includes:

* Components of the SLAM system: frame, map, mappoint, etc.

* Functions for achieving vo: [vo.h](vo.h)
4 changes: 2 additions & 2 deletions include/my_slam/motion_funcs.h → include/my_slam/commons.h
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@

#ifndef MOTION_FUNCS_H
#define MOTION_FUNCS_H
#ifndef COMMONS_H
#define COMMONS_H

#include "my_slam/common_include.h"
#include "my_slam/frame.h"
Expand Down
8 changes: 4 additions & 4 deletions include/my_slam/vo.h
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,11 @@
#include "my_slam/frame.h"
#include "my_slam/map.h"
#include "my_slam/mappoint.h"
#include "my_slam/motion_funcs.h"
#include "my_slam/commons.h"

namespace my_slam
{
using namespace std;
using namespace std;
using namespace cv;
using namespace my_geometry;

Expand Down Expand Up @@ -60,7 +60,6 @@ class VisualOdometry
vector<Point3f> matched_pts_3d_in_map_;
vector<int> matched_pts_2d_idx_;


public: // ------------------------------- Constructor
VisualOdometry();
void addFrame(my_slam::Frame::Ptr frame);
Expand All @@ -71,7 +70,7 @@ class VisualOdometry
void estimateMotionAnd3DPoints();
bool checkIfVoGoodToInit(const vector<KeyPoint> &init_kpts, const vector<KeyPoint> &curr_kpts, const vector<DMatch> &matches);
bool isInitialized();

public: // ------------------------------- Tracking -------------------------------
// void find3Dto2DCorrespondences()
bool checkLargeMoveForAddKeyFrame(Frame::Ptr curr, Frame::Ptr ref);
Expand All @@ -86,6 +85,7 @@ class VisualOdometry
vector<Mat> pushCurrPointsToMap();
double getViewAngle(Frame::Ptr frame, MapPoint::Ptr point);
};

} // namespace my_slam

#endif // FRAME_H
3 changes: 1 addition & 2 deletions src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,9 @@ add_library( my_slam SHARED
frame.cpp
vo.cpp
vo_addFrame.cpp
vo_motions.cpp
map.cpp
mappoint.cpp
motion_funcs.cpp
commons.cpp
)


Expand Down
2 changes: 1 addition & 1 deletion src/camera.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,6 @@ Point2f cam2pixel(const Mat &p, const Mat &K)


// ---------------- Class ----------------

// Not used

} // namespace my_slam
2 changes: 1 addition & 1 deletion src/motion_funcs.cpp → src/commons.cpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@

#include "my_slam/motion_funcs.h"
#include "my_slam/commons.h"

namespace my_slam
{
Expand Down
Loading

0 comments on commit a561836

Please sign in to comment.