Skip to content

Commit

Permalink
Update README
Browse files Browse the repository at this point in the history
  • Loading branch information
felixchenfy committed Jan 30, 2019
1 parent 8542d39 commit 15ef2a2
Show file tree
Hide file tree
Showing 16 changed files with 266 additions and 40 deletions.
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -17,5 +17,4 @@ test_tmp/
# Compile
lib/
build/
cmake_modules/
bin/
3 changes: 2 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ project ( my_slam )
# compile settings
set( CMAKE_CXX_COMPILER "g++" )
set( CMAKE_BUILD_TYPE "Debug" )
# set( CMAKE_BUILD_TYPE "Release" ) # *** stack smashing detected ***: <unknown> terminated
set( CMAKE_CXX_FLAGS "-std=c++11 -O3" )

# list( APPEND CMAKE_MODULE_PATH ${PROJECT_SOURCE_DIR}/cmake_modules )
Expand Down Expand Up @@ -54,5 +55,5 @@ include_directories( ${PROJECT_SOURCE_DIR}/include )
add_subdirectory( src )
add_subdirectory( src_main )
add_subdirectory( test )
add_subdirectory( test_tmp )
# add_subdirectory( test_tmp )

123 changes: 101 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,44 +1,123 @@

# Monocular Visual Odometry
A simple Monocular Visual Odometry (VO) with initialization, tracking, local mapping, and bundle adjustment.
![link to gif]
Monocular Visual Odometry
=======================================

![](result_frame.png)
**Content:** A simple **Monocular Visual Odometry** (VO) with initialization, tracking, local map, and optimization on single frame.

**Video demo**: http://feiyuchen.com/wp-content/uploads/vo_with_opti.mp4
On the left, **white line** is the estimated camera trajectory, **green line** is ground truth. On the right, **green** are keypoints, and **red** are inlier matches with map points.

# Algorithm
![](https://github.com/felixchenfy/Monocular-Visual-Odometry-Data/raw/master/result/vo_with_opti.gif)

**Project purpose:** This is a practice after I read the [Slambook](https://github.com/gaoxiang12/slambook). This will also be my final project of the course EESC-432 Advanced Computer Vision, so this repo will be kept on updating until and completed by **March 15th**.

**Data files:** Please download from here: https://github.com/felixchenfy/Monocular-Visual-Odometry-Data

**Directory:**
<!-- TOC -->

- [Monocular Visual Odometry](#monocular-visual-odometry)
- [1. Algorithm](#1-algorithm)
- [1.1. Initialization](#11-initialization)
- [1.2. Tracking](#12-tracking)
- [1.3. Optimization](#13-optimization)
- [1.4. Local Map](#14-local-map)
- [1.5. Other details](#15-other-details)
- [2. Software Architecture](#2-software-architecture)
- [3. How to Run](#3-how-to-run)
- [4. Result](#4-result)
- [5. Reference](#5-reference)
- [6. To Do](#6-to-do)

<!-- /TOC -->




# 1. Algorithm
This VO is achieved by the following procedures/algorithms:

1. Initialization
## 1.1. Initialization

**When to initialize**:
Given a video, set the 1st frame(image) as reference, and do feature matching with the 2nd frame. If the average displacement in pixel between inlier matched keypoints exceeds a threshold, the initialization will be started. Otherwise, skip to the 3rd, 4th, etc frames until the criteria satisfies at the K_th frame. Then we estimate the relative camera pose between the 1st and K_th frame.

**Estimate relative camera pose**:
Compute the **Essential Matrix** (E) and **Homography Matrix** (H) between the two frames. Compute their **Symmetric Transfer Error** by method in [ORB-SLAM paper](https://arxiv.org/abs/1502.00956) and choose the better one (i.e., choose H if H/(E+H)>0.45). **Decompose E or H** into the relative pose of rotation (R) and translation (t). By using OpenCV, E gives 1 result, and H gives 2 results, satisfying the criteria that points are in front of camera. For E, only one result to choose; For H, choose the one that makes the image plane and world-points plane more parallel.

**Recover scale**:
Scale the translation t to be either: (1) Features points have average depth of 1m. Or (2) make it same scale as the corresponding groundth data so that I can draw and compare.

**Keyframe and local map**:
Insert both 1st and K_th frame as **keyframe**. **Triangulate** their inlier points to obtain the points' world positions. These points are called **map points** and are pushed to **local map**.

## 1.2. Tracking

Keep on estimating the next camera pose. First, find map points that are in the camera view. Do feature matching to find 2d-3d correspondance between 3d map points and 2d image keypoints. Estimate camera pose by RANSAC and PnP.

## 1.3. Optimization

Apply optimization to this single frame : Using the inlier 3d-2d corresponding from PnP, we can compute the sum of reprojection error of each point pair to form the cost function. By computing the deriviate wrt (1) points 3d pos and (2) camera pose, we can solve the optimization problem using Gauss-Newton Method and its variants. These are done by **g2o** and its built-in datatypes of "VertexSBAPointXYZ", "VertexSE3Expmap", and "EdgeProjectXYZ2UV". See Slambook Chapter 4 and Chapter 7.8.2 for more details.

Then the camera pose and inlier points' 3d pos are updated, at a level of about 0.0001 meter. (Though my final result shows that this optimization doesn't make much difference. Maybe I need to optimize more frames and keypoints)

(TODO: Apply optimization to multiple frames, and then I call make it a real bundle adjustment.)

## 1.4. Local Map

**Insert keyframe:** If the relative pose between current frame and previous keyframe is large enough, with a translation or rotation larger than the threshold, insert current frame as a keyframe. Triangulate 3d points and push to local map.

**Clean up local map:** Remove map points that are: (1) not in current view, (2) whose view_angle is larger than threshold, (3) rarely be matched as inlier point. (See Slambook Chapter 9.4.)

## 1.5. Other details

**Image features**:
Extract ORB keypoints and features. Then, a simple grid sampling on keypoint's pixel pos is applied to retain uniform keypoints.

**Feature matching**:
Two methods are implemented, where good match is:
(1) Feature's distance is smaller than threshold, described in Slambook.
(2) Ratio of smallest and second smallest distance is smaller than threshold, proposed in Prof. Lowe's 2004 SIFT paper.
The first one is adopted, which generates fewer error matches.

When to initialize: Given a video, set the 1st frame(image) as reference, and do feature matching with the 2nd frame. If the average displacement between inlier matched keypoints exceeds a threshold, the initialization will be started. Otherwise, skip to the 3rd, 4th, etc frames until the criteria satisfies at the K_th frame.
# 2. Software Architecture
TODO

How to compute initial movement: Then, estimating the relative pose between 1st and K_th frame: Compute the Essential Matrix (E) and Homography Matrix (H) between the two frames. Compute their Symmetric Transfer Error by method in [ORB-SLAM paper](https://arxiv.org/abs/1502.00956) and choose the better one (i.e., choose H if H/(E+H)>0.45). Decompose E/H into the relative pose of rotation (R) and translation (t). By using OpenCV, E gives 1 result, and H gives 2 results, satisfying the criteria that points are in front of camera. For E, only one result to choose; For H, choose the one that makes the image plane and world-points plane more parallel.
# 3. How to Run
$ mkdir build && mkdir lib && mkdir bin
$ cd build && cmake .. && make && cd ..
$ bin/run_vo config/config.yaml

Recover scale: Scale the translation t to be either: (1) Features points have mean depth of 1m. Or (2) make it same scale as the corresponding groundth data so that I can draw and compare.
# 4. Result
TODO

Keyframe and local map: Insert both 1st and K_th frame as **keyframe**. Triangulate their inlier points to obtain the points' world positions. These points are called **map points** and are pushed to **local map**.
# 5. Reference

2. Tracking
**(1) Slambook**
I read this Dr. Xiang Gao's [slambook](https://github.com/gaoxiang12/slambook) before writing code. The book provides both vSLAM theory as well as easy-to-read code examples in every chapter.

Estimate new camera pose: For the following ith frame, find map points that are in the camera view. Do feature matching to find 2d-3d correspondance between 3d map points and 2d image keypoints. Estimate camera pose by RANSAC and PnP.
The framework of my program is based on Chapter 9 of Slambook, which is a RGB-D visual odometry project. Classes declared in [include/my_slam/](include/my_slam/) are using its structure.

3. Optimization
These files are mainly copied from Slambook and then modified:
* CMakeLists.txt
* [include/my_basics/config.h](include/my_basics/config.h) and its .cpp.
* [include/my_optimization/g2o_ba.h](include/my_optimization/g2o_ba.h) and its .cpp.

Apply bundle adjustment to this single frame: Using the inlier 3d-2d corresponding from PnP, we can compute the sum of reprojection error of each point pair to form the cost function. By computing the deriviate wrt (1) points 3d pos and (2) camera pose, we can solve the optimization problem using Gauss-Newton Method and its variants. These are done by **g2o** and its built-in datatypes of "VertexSBAPointXYZ", "VertexSE3Expmap", and "EdgeProjectXYZ2UV". See [Slambook](https://github.com/gaoxiang12/slambook) Chapter 4 and Chapter 7.8.2 for more details.
I also borrowed other codes from the slambook. But since they are small pieces and lines, I didn't list them here.

Then the camera pose and inlier points' 3d pos are updated, at a level of 0.0001 meter. (Though the final result show that this optimization doesn't make much difference.)
In short, the Slambook provides huge help for me and my this project.

4. Local Map
**(2) ORB-SLAM paper**

Insert keyframe: If the relative pose between current frame and previous keyframe is large enough, with a translation or rotation larger than the threshold, insert current frame as a keyframe. Triangulate 3d points and push to local map.
Slambook doesn't write a lot about monocular visual odometry, so I resorted to this paper for help. Besides learning import ideas for vSLAM, I borrowed its code of **the criteria for choosing Essential or Homography** for VO initialization.

Clean up local map: Remove map points that are: (1) not in current view, (2) view angle larger than threshold, (3) ratio of match/visible times smaller than threshold. (This reference Slambook Chapter 9.4.)
For my next stage, I will read more of this paper's code in order to keep on improving my project.

5. Other details

* Image features: ORB. Then, a simple grid sampling on keypoint's pixel pos is applied to avoid the keypoints being too dense.
* Feature matching: Two methods are implemented, where good match is: (1) Feature's distance is smaller than threshold, described in Slambook. (2) Ratio of smallest and second smallest distance is smaller than threshold, proposed in Prof. Lowe's 2004 SIFT paper. The first one is adopted, which generates fewer error matches.
# 6. To Do

# Software Architecture
* Bugs:
In release mode, the program throws an error:
> *** stack smashing detected ***: <unknown> terminated
Please run in debug mode.

25 changes: 25 additions & 0 deletions cmake_modules/FindCSparse.cmake
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Look for csparse; note the difference in the directory specifications!
FIND_PATH(CSPARSE_INCLUDE_DIR NAMES cs.h
PATHS
/usr/include/suitesparse
/usr/include
/opt/local/include
/usr/local/include
/sw/include
/usr/include/ufsparse
/opt/local/include/ufsparse
/usr/local/include/ufsparse
/sw/include/ufsparse
)

FIND_LIBRARY(CSPARSE_LIBRARY NAMES cxsparse
PATHS
/usr/lib
/usr/local/lib
/opt/local/lib
/sw/lib
)

include(FindPackageHandleStandardArgs)
find_package_handle_standard_args(CSPARSE DEFAULT_MSG
CSPARSE_INCLUDE_DIR CSPARSE_LIBRARY)
113 changes: 113 additions & 0 deletions cmake_modules/FindG2O.cmake
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# Find the header files

FIND_PATH(G2O_INCLUDE_DIR g2o/core/base_vertex.h
${G2O_ROOT}/include
$ENV{G2O_ROOT}/include
$ENV{G2O_ROOT}
/usr/local/include
/usr/include
/opt/local/include
/sw/local/include
/sw/include
NO_DEFAULT_PATH
)

# Macro to unify finding both the debug and release versions of the
# libraries; this is adapted from the OpenSceneGraph FIND_LIBRARY
# macro.

MACRO(FIND_G2O_LIBRARY MYLIBRARY MYLIBRARYNAME)

FIND_LIBRARY("${MYLIBRARY}_DEBUG"
NAMES "g2o_${MYLIBRARYNAME}_d"
PATHS
${G2O_ROOT}/lib/Debug
${G2O_ROOT}/lib
$ENV{G2O_ROOT}/lib/Debug
$ENV{G2O_ROOT}/lib
NO_DEFAULT_PATH
)

FIND_LIBRARY("${MYLIBRARY}_DEBUG"
NAMES "g2o_${MYLIBRARYNAME}_d"
PATHS
~/Library/Frameworks
/Library/Frameworks
/usr/local/lib
/usr/local/lib64
/usr/lib
/usr/lib64
/opt/local/lib
/sw/local/lib
/sw/lib
)

FIND_LIBRARY(${MYLIBRARY}
NAMES "g2o_${MYLIBRARYNAME}"
PATHS
${G2O_ROOT}/lib/Release
${G2O_ROOT}/lib
$ENV{G2O_ROOT}/lib/Release
$ENV{G2O_ROOT}/lib
NO_DEFAULT_PATH
)

FIND_LIBRARY(${MYLIBRARY}
NAMES "g2o_${MYLIBRARYNAME}"
PATHS
~/Library/Frameworks
/Library/Frameworks
/usr/local/lib
/usr/local/lib64
/usr/lib
/usr/lib64
/opt/local/lib
/sw/local/lib
/sw/lib
)

IF(NOT ${MYLIBRARY}_DEBUG)
IF(MYLIBRARY)
SET(${MYLIBRARY}_DEBUG ${MYLIBRARY})
ENDIF(MYLIBRARY)
ENDIF( NOT ${MYLIBRARY}_DEBUG)

ENDMACRO(FIND_G2O_LIBRARY LIBRARY LIBRARYNAME)

# Find the core elements
FIND_G2O_LIBRARY(G2O_STUFF_LIBRARY stuff)
FIND_G2O_LIBRARY(G2O_CORE_LIBRARY core)

# Find the CLI library
FIND_G2O_LIBRARY(G2O_CLI_LIBRARY cli)

# Find the pluggable solvers
FIND_G2O_LIBRARY(G2O_SOLVER_CHOLMOD solver_cholmod)
FIND_G2O_LIBRARY(G2O_SOLVER_CSPARSE solver_csparse)
FIND_G2O_LIBRARY(G2O_SOLVER_CSPARSE_EXTENSION csparse_extension)
FIND_G2O_LIBRARY(G2O_SOLVER_DENSE solver_dense)
FIND_G2O_LIBRARY(G2O_SOLVER_PCG solver_pcg)
FIND_G2O_LIBRARY(G2O_SOLVER_SLAM2D_LINEAR solver_slam2d_linear)
FIND_G2O_LIBRARY(G2O_SOLVER_STRUCTURE_ONLY solver_structure_only)
FIND_G2O_LIBRARY(G2O_SOLVER_EIGEN solver_eigen)

# Find the predefined types
FIND_G2O_LIBRARY(G2O_TYPES_DATA types_data)
FIND_G2O_LIBRARY(G2O_TYPES_ICP types_icp)
FIND_G2O_LIBRARY(G2O_TYPES_SBA types_sba)
FIND_G2O_LIBRARY(G2O_TYPES_SCLAM2D types_sclam2d)
FIND_G2O_LIBRARY(G2O_TYPES_SIM3 types_sim3)
FIND_G2O_LIBRARY(G2O_TYPES_SLAM2D types_slam2d)
FIND_G2O_LIBRARY(G2O_TYPES_SLAM3D types_slam3d)

# G2O solvers declared found if we found at least one solver
SET(G2O_SOLVERS_FOUND "NO")
IF(G2O_SOLVER_CHOLMOD OR G2O_SOLVER_CSPARSE OR G2O_SOLVER_DENSE OR G2O_SOLVER_PCG OR G2O_SOLVER_SLAM2D_LINEAR OR G2O_SOLVER_STRUCTURE_ONLY OR G2O_SOLVER_EIGEN)
SET(G2O_SOLVERS_FOUND "YES")
ENDIF(G2O_SOLVER_CHOLMOD OR G2O_SOLVER_CSPARSE OR G2O_SOLVER_DENSE OR G2O_SOLVER_PCG OR G2O_SOLVER_SLAM2D_LINEAR OR G2O_SOLVER_STRUCTURE_ONLY OR G2O_SOLVER_EIGEN)

# G2O itself declared found if we found the core libraries and at least one solver
SET(G2O_FOUND "NO")
IF(G2O_STUFF_LIBRARY AND G2O_CORE_LIBRARY AND G2O_INCLUDE_DIR AND G2O_SOLVERS_FOUND)
SET(G2O_FOUND "YES")
ENDIF(G2O_STUFF_LIBRARY AND G2O_CORE_LIBRARY AND G2O_INCLUDE_DIR AND G2O_SOLVERS_FOUND)
2 changes: 1 addition & 1 deletion config/default.yaml → config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
# ===== Settings for displaying
MAX_NUM_IMAGES: 300
PCL_WAIT_FOR_KEY_PRESS: 0 # If 1, PCL Viewer will stop and wait for any of your keypress before continueing.
USE_BA: 1 # Use bundle adjustment for camera and points in single frame. 1 for true, 0 for false
USE_BA: 1 # Use bundle adjustment for camera and points in single frame. 1 for true, 0 for false
DRAW_GROUND_TRUTH_TRAJ: 1 # Ground truth traj's color is set as green. Estimated is set as white.
GROUND_TRUTH_TRAJ_FILENAME: /home/feiyu/Desktop/slam/my_vo/my2/data/test_data/cam_traj_truth.txt
STORE_CAM_TRAJ: /home/feiyu/Desktop/slam/my_vo/my2/data/test_data/cam_traj.txt
Expand Down
4 changes: 4 additions & 0 deletions include/my_basics/config.h
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
/* This script is mainly copied and then modified from Chapter 9 of Dr. Xiang Gao's book. Link is here:
https://github.com/gaoxiang12/slambook/blob/master/project/0.4/include/myslam/config.h
The modification is that in OpenCV 4.0, I have to use "new" to create a cv::FileStorage instance.
*/

#ifndef CONFIG_H
#define CONFIG_H
Expand Down
2 changes: 1 addition & 1 deletion include/my_geometry/camera.h
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Mat world2camera(const Point3f &p, const Mat &T_world_to_cam);


// ---------------- transformations (by Eigen && Sophus; For g2o optimization) ----------------
// ------------ !!! Copied from Dr. Xiang Gao's book !!! --------------
// ------------ !!! Copy from Dr. Xiang Gao's book !!! --------------
// TO DO

// ---------------- Class ----------------
Expand Down
4 changes: 2 additions & 2 deletions python_tools/rename_image_filenames.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@

# Run this script, to rename all the images in dataset_dir (specified by default.yaml) to the required filenames.
# Run this script, to rename all the images in dataset_dir (specified by config.yaml) to the required filenames.

import yaml # for loading .yaml file
import os # for getting/setting filenames

# Open config file
stream = open("config/default.yaml", "r")
stream = open("config/config.yaml", "r")

# !!!!!
# If the first sentence is "%YAML:1.0", skip it
Expand Down
17 changes: 11 additions & 6 deletions src/io.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -75,11 +75,11 @@ void writePoseToFile(const string filename, vector<cv::Mat> list_T)
double y = T.at<double>(1, 3);
double z = T.at<double>(2, 3);
fout << x << " " << y << " " << z << " ";
for (int i = 0; i < 3; i++)
for (int i = 0; i < 3; i++) // order: 1st column, 2nd column, 3rd column
{
for (int j = 0; j < 3; j++)
{
fout << T.at<double>(i, j) << " ";
fout << T.at<double>(j, i) << " ";
}
}
fout << '\n';
Expand All @@ -100,7 +100,7 @@ vector<cv::Mat> readPoseToFile(const string filename)
assert(fin.is_open()); // Fail to find the config file

// Read data
const int NUM_IN_ROW = 12; // x,y,z, 1st row of R, 2nd row of R, 3rd row of R
const int NUM_IN_ROW = 12; // x,y,z, 1st column of R, 2nd column of R, 3rd column of R
vector<double> pose(NUM_IN_ROW, 0.0);
double val;
int cnt = 0;
Expand All @@ -111,9 +111,14 @@ vector<cv::Mat> readPoseToFile(const string filename)
{
cnt = 0;
list_pose.push_back(pose);
cv::Mat T = (cv::Mat_<double>(4, 4) << pose[3], pose[4], pose[5], pose[0],
pose[6], pose[7], pose[8], pose[1],
pose[9], pose[10], pose[11], pose[2],
// cv::Mat T = (cv::Mat_<double>(4, 4) << pose[3], pose[4], pose[5], pose[0],
// pose[6], pose[7], pose[8], pose[1],
// pose[9], pose[10], pose[11], pose[2],
// 0, 0, 0, 1);
cv::Mat T = (cv::Mat_<double>(4, 4) <<
pose[3], pose[6], pose[9], pose[0],
pose[4], pose[7], pose[10], pose[1],
pose[5], pose[8], pose[11], pose[2],
0, 0, 0, 1);
list_T.push_back(T);
}
Expand Down
2 changes: 1 addition & 1 deletion src/pcl_display.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ PclViewer::PclViewer(double x, double y, double z,
double rot_axis_x, double rot_axis_y, double rot_axis_z)
{
// Set names
viewer_name_ = "viewer_name_";
viewer_name_ = "Camera trajectory: White is estimated; Green (Optional) is truth";
camera_frame_name_ = "camera_frame_name_";
truth_camera_frame_name_ = "truth_camera_frame_name_";

Expand Down
Loading

0 comments on commit 15ef2a2

Please sign in to comment.