Updating readme

felixchenfy · Jan 30, 2019 · 8542d39 · 8542d39
1 parent a16a614
commit 8542d39
Show file tree

Hide file tree

Showing 9 changed files with 57 additions and 24 deletions.
diff --git a/.gitignore b/.gitignore
@@ -4,15 +4,14 @@
 tmp.cpp
 source_this.bash
 *.out
-README*
+README_*
 
 .vscode/
 others/
 garbage/
 
 # Test_data
-dataset*/
-test_data/
+data/
 test_tmp/
 
 # Compile

diff --git a/README.md b/README.md
@@ -0,0 +1,44 @@
+
+# Monocular Visual Odometry
+A simple Monocular Visual Odometry (VO) with initialization, tracking, local mapping, and bundle adjustment.
+![link to gif]
+
+![](result_frame.png)
+
+
+# Algorithm
+This VO is achieved by the following procedures/algorithms:
+
+1. Initialization
+
+    When to initialize: Given a video, set the 1st frame(image) as reference, and do feature matching with the 2nd frame. If the average displacement between inlier matched keypoints exceeds a threshold, the initialization will be started. Otherwise, skip to the 3rd, 4th, etc frames until the criteria satisfies at the K_th frame.
+
+    How to compute initial movement: Then, estimating the relative pose between 1st and K_th frame: Compute the Essential Matrix (E) and Homography Matrix (H) between the two frames. Compute their Symmetric Transfer Error by method in [ORB-SLAM paper](https://arxiv.org/abs/1502.00956) and choose the better one (i.e., choose H if H/(E+H)>0.45). Decompose E/H into the relative pose of rotation (R) and translation (t). By using OpenCV, E gives 1 result, and H gives 2 results, satisfying the criteria that points are in front of camera. For E, only one result to choose; For H, choose the one that makes the image plane and world-points plane more parallel.
+
+    Recover scale: Scale the translation t to be either: (1) Features points have mean depth of 1m. Or (2) make it same scale as the corresponding groundth data so that I can draw and compare.
+
+    Keyframe and local map: Insert both 1st and K_th frame as **keyframe**. Triangulate their inlier points to obtain the points' world positions. These points are called **map points** and are pushed to **local map**.
+
+2. Tracking
+
+    Estimate new camera pose: For the following ith frame, find map points that are in the camera view. Do feature matching to find 2d-3d correspondance between 3d map points and 2d image keypoints. Estimate camera pose by RANSAC and PnP.
+
+3. Optimization
+
+    Apply bundle adjustment to this single frame: Using the inlier 3d-2d corresponding from PnP, we can compute the sum of reprojection error of each point pair to form the cost function. By computing the deriviate wrt (1) points 3d pos and (2) camera pose, we can solve the optimization problem using Gauss-Newton Method and its variants. These are done by **g2o** and its built-in datatypes of "VertexSBAPointXYZ", "VertexSE3Expmap", and "EdgeProjectXYZ2UV". See [Slambook](https://github.com/gaoxiang12/slambook) Chapter 4 and Chapter 7.8.2 for more details.
+
+    Then the camera pose and inlier points' 3d pos are updated, at a level of 0.0001 meter. (Though the final result show that this optimization doesn't make much difference.) 
+
+4. Local Map
+
+    Insert keyframe: If the relative pose between current frame and previous keyframe is large enough, with a translation or rotation larger than the threshold, insert current frame as a keyframe. Triangulate 3d points and push to local map.
+
+    Clean up local map: Remove map points that are: (1) not in current view, (2) view angle larger than threshold, (3) ratio of match/visible times smaller than threshold. (This reference Slambook Chapter 9.4.)
+
+5. Other details
+
+* Image features: ORB. Then, a simple grid sampling on keypoint's pixel pos is applied to avoid the keypoints being too dense. 
+* Feature matching: Two methods are implemented, where good match is: (1) Feature's distance is smaller than threshold, described in Slambook. (2) Ratio of smallest and second smallest distance is smaller than threshold, proposed in Prof. Lowe's 2004 SIFT paper. The first one is adopted, which generates fewer error matches.
+
+# Software Architecture
+
diff --git a/config/default.yaml b/config/default.yaml
@@ -5,30 +5,30 @@ MAX_NUM_IMAGES: 300
 PCL_WAIT_FOR_KEY_PRESS: 0 # If 1, PCL Viewer will stop and wait for any of your keypress before continueing.
 USE_BA: 1 # Use bundle adjustment for camera and points in single frame. 1 for true, 0 for false
 DRAW_GROUND_TRUTH_TRAJ: 1 # Ground truth traj's color is set as green. Estimated is set as white.
-GROUND_TRUTH_TRAJ_FILENAME: /home/feiyu/Desktop/slam/my_vo/my2/test_data/cam_traj_truth.txt
-STORE_CAM_TRAJ: /home/feiyu/Desktop/slam/my_vo/my2/test_data/cam_traj.txt
+GROUND_TRUTH_TRAJ_FILENAME: /home/feiyu/Desktop/slam/my_vo/my2/data/test_data/cam_traj_truth.txt
+STORE_CAM_TRAJ: /home/feiyu/Desktop/slam/my_vo/my2/data/test_data/cam_traj.txt
 
 # ===== Dataset and camera intrinsics
 
 # -- fr1_desk dataset
 # https://vision.in.tum.de/data/datasets/rgbd-dataset/file_formats
-# dataset_dir: /home/feiyu/Desktop/slam/my_vo/my2/dataset_images_fr1_desk
+# dataset_dir: /home/feiyu/Desktop/slam/my_vo/my2/data/dataset_images_fr1_desk
 # num_images: 100
 # camera_info.fx: 517.3
 # camera_info.fy: 516.5
 # camera_info.cx: 325.1
 # camera_info.cy: 249.7
 
 # -- fr1_xyz dataset
-# dataset_dir: /home/feiyu/Desktop/slam/my_vo/my2/dataset_images_fr1_xyz
+# dataset_dir: /home/feiyu/Desktop/slam/my_vo/my2/data/dataset_images_fr1_xyz
 # num_images: 100
 # camera_info.fx: 517.3
 # camera_info.fy: 516.5
 # camera_info.cx: 325.1
 # camera_info.cy: 249.7
 
 # -- New Tsukuba Stereo dataset used in matlab tutorial
-dataset_dir: /home/feiyu/Desktop/slam/my_vo/my2/dataset_images_matlab
+dataset_dir: /home/feiyu/Desktop/slam/my_vo/my2/data/dataset_images_matlab
 num_images: 150
 camera_info.fx: 615
 camera_info.fy: 615

diff --git a/python_tools/calibrate_camera.py b/python_tools/calibrate_camera.py
@@ -6,7 +6,7 @@
 import glob # for getting files' names in the disk
 # import pickle # for saving variables to disk
 
-IMAGE_FOLDER='/home/feiyu/Desktop/slam/my_vo/my2/fr1_rgb_calibration/'
+IMAGE_FOLDER='/home/feiyu/Desktop/slam/my_vo/my2/data/fr1_rgb_calibration/'
 
 
 # termination criteria

diff --git a/python_tools/undistort_all_images.py b/python_tools/undistort_all_images.py
@@ -5,8 +5,8 @@
 import glob, os
 
 if 1: # fr1 dataset
-    INPUT_FOLDER = '/home/feiyu/Desktop/slam/my_vo/my2/dataset_images_fr1_xyz/'
-    OUTPUT_FOLDER = '/home/feiyu/Desktop/slam/my_vo/my2/undist/'
+    INPUT_FOLDER = '/home/feiyu/Desktop/slam/my_vo/my2/data/dataset_images_fr1_xyz/'
+    OUTPUT_FOLDER = '/home/feiyu/Desktop/slam/my_vo/my2/data/undist/'
 
     camera_intrinsics = np.array([
         [517.3, 0, 325.1],

diff --git a/src_main/run_vo_v1.cpp b/src_main/run_vo_v1.cpp
@@ -55,7 +55,7 @@ int main(int argc, char **argv)
     vector<string> image_paths;
     if (DEBUG_MODE)
     {
-        string folder = "/home/feiyu/Desktop/slam/my_vo/my2/test_data/";
+        string folder = "/home/feiyu/Desktop/slam/my_vo/my2/data/test_data/";
         vector<string> tmp{
             "image0001.jpg", "image0013.jpg", "image0015.jpg"};
         for (string &filename : tmp)

diff --git a/test/test_PnP.cpp b/test/test_PnP.cpp
@@ -21,7 +21,7 @@ int main ( int argc, char** argv )
 {
 
     cout << "program start" << endl;
-    string folder="test_data/";
+    string folder="data/test_data/";
     string img_file1="fr1_1_1.png";
     string img_file2="fr1_1_2.png";
     string img_file3="fr1_1_1_depth.png";

diff --git a/test/test_epipolor_geometry.cpp b/test/test_epipolor_geometry.cpp
@@ -31,7 +31,7 @@ int main(int argc, char **argv)
 
     // read in images
     string img_file1, img_file2;
-    string folder = "test_data/";
+    string folder = "data/test_data/";
     int IDX_TEST_CASE = 1;
 
     if (argc - 1 == 2)

diff --git a/test/tmp.cpp b/test/tmp.cpp