|
1 | 1 | # Hierarchical Localization
|
| 2 | + |
| 3 | +This repository contains the training and deployment code used in our paper *Leveraging Deep Visual Descriptors* |
| 4 | +*for Hierarchical Efficient Localization*. This work introduces **MobileNetVLAD**, a mobile-friendly image retrieval deep neural network that significantly improves the performance of classical 6-DoF visual localization through a hierarchical search. |
| 5 | + |
| 6 | +<p align="center"> |
| 7 | + <a href="https://www.youtube.com/user/aslteam" target="_blank"> |
| 8 | + <img src="doc/video_thumbnail.png" width="60%" style="opacity:0.5; border:1px solid black"/> |
| 9 | + <br /><em>The approach is described in details in our video (click to play).</em> |
| 10 | + </a> |
| 11 | +</p> |
| 12 | + |
| 13 | +## |
| 14 | + |
| 15 | +We introduce here two main features: |
| 16 | +- The deployment code of MobileNetVLAD: `global-loc`, a C++ ROS/Catkin package that can |
| 17 | + - load any trained image retrieval model, |
| 18 | + - efficiently perform the inference on GPU or CPU, |
| 19 | + - index a given map and save it as a protobuf, |
| 20 | + - and retrieve keyframes given a query image; |
| 21 | +- The training code: `retrievalnet`, a modular Python+Tensorflow package that allows to |
| 22 | + - train the model on any target image domain, |
| 23 | + - using the supervision of any existing teacher network. |
| 24 | + |
| 25 | +The modularity of our system allows to train a model and index a map on a powerful workstation while performing the retrieval on a mobile platform. Our code has thus been extensively tested on an NVIDIA Jetson TX2, widely used for robotics research. |
| 26 | + |
| 27 | +<p align="center"> |
| 28 | + <a href="https://nbviewer.jupyter.org/github/ethz-asl/hierarchical_loc/blob/master/notebooks/tango_visualize_retrieval.ipynb"> |
| 29 | + <img src="doc/zurich_query_1.png" width="70%"/> |
| 30 | + <img src="doc/zurich_query_2.png" width="70%"/> |
| 31 | + </a> |
| 32 | + <br /><em>Retrieval on our Zurich dataset: strong illumination and viewpoint changes.</em> |
| 33 | +</p> |
| 34 | + |
| 35 | + |
| 36 | +## Deployment |
| 37 | + |
| 38 | +The package relies on map primitives provided by [maplab](https://github.com/ethz-asl/maplab), but can be easily adapted to other SLAM frameworks. We thus do not release the code performing the local matching. The trained MobileNetVLAD is provided in `global-loc/models/` and is loaded using [tensorflow_catkin](https://github.com/ethz-asl/tensorflow_catkin). |
| 39 | + |
| 40 | +### Installation |
| 41 | +Both Ubuntu 14.04 and 16.04 are supported. First install the [system packages](https://github.com/ethz-asl/maplab/wiki/Installation-Ubuntu#install-required-system-packages) required by maplab. |
| 42 | + |
| 43 | +Then setup the Catkin workspace: |
| 44 | +```bash |
| 45 | +export ROS_VERSION=kinetic #(Ubuntu 16.04: kinetic, Ubuntu 14.04: indigo) |
| 46 | +export CATKIN_WS=~/maplab_ws |
| 47 | +mkdir -p $CATKIN_WS/src |
| 48 | +cd $CATKIN_WS |
| 49 | +catkin init |
| 50 | +catkin config --merge-devel # Necessary for catkin_tools >= 0.4. |
| 51 | +catkin config --extend /opt/ros/$ROS_VERSION |
| 52 | +catkin config --cmake-args \ |
| 53 | + -DCMAKE_BUILD_TYPE=Release \ |
| 54 | + -DENABLE_TIMING=1 \ |
| 55 | + -DENABLE_STATISTICS=1 \ |
| 56 | + -DCMAKE_CXX_FLAGS="-fext-numeric-literals -msse3 -msse4.1 -msse4.2" \ |
| 57 | + -DCMAKE_CXX_STANDARD=14 |
| 58 | +cd src |
| 59 | +``` |
| 60 | +If you want to perform the inference on GPU (see the requirements of [tensorflow_catkin](https://github.com/ethz-asl/tensorflow_catkin)), add: |
| 61 | +```bash |
| 62 | +catkin config --append-args --cmake-args -DUSE_GPU=ON |
| 63 | +``` |
| 64 | +Finally clone the repository and build: |
| 65 | +```bash |
| 66 | +git clone https://github.com/ethz-asl/hierarchical_loc.git --recursive |
| 67 | +touch hierarchical_loc/catkin_dependencies/maplab_dependencies/3rd_party/eigen_catkin/CATKIN_IGNORE |
| 68 | +touch hierarchical_loc/catkin_dependencies/maplab_dependencies/3rd_party/protobuf_catkin/CATKIN_IGNORE |
| 69 | +cd $CATKIN_WS && catkin build global_loc |
| 70 | +``` |
| 71 | +Run the test examples: |
| 72 | +```bash |
| 73 | +./devel/lib/global_loc/test_inference |
| 74 | +./devel/lib/global_loc/test_query_index |
| 75 | +``` |
| 76 | + |
| 77 | +### Indexing |
| 78 | +Given a VI map in `global-loc/maps/`, an index of global descriptors can be created in `global-loc/data/`: |
| 79 | +```bash |
| 80 | +./devel/lib/global_loc/build_index \ |
| 81 | + --map_name <map_name> \ |
| 82 | + --model_name mobilenetvlad_depth-0.35 \ |
| 83 | + --proto_name <index_name.pb> |
| 84 | +``` |
| 85 | +As an example, we provide the [Zurich map](https://github.com/ethz-asl/hierarchical_loc/releases/download/1.0/lindenhof_afternoon-wet_aligned.tar.gz) used in our paper. Several indexing options are available in [place-retrieval.cc](global-loc/src/place-retrieval.cc), such as subsampling or mission selection. |
| 86 | + |
| 87 | +### Retrieval |
| 88 | +An example of query is provided in [test_query_index.cc](global-loc/test/test_query_index.cc). Descriptor indexes for the Zurich dataset are included in `global-loc/data/` and can be used to time the queries: |
| 89 | +```bash |
| 90 | +./devel/lib/global_loc/time_query \ |
| 91 | + --map_name <map_name> \ |
| 92 | + --model_name mobilenetvlad_depth-0.35 \ |
| 93 | + --proto_name lindenhof_afternoon_aligned_mobilenet-d0.35.pb \ |
| 94 | + --query_mission f6837cac0168580aa8a66be7bbb20805 \ |
| 95 | + --use_pca --pca_dims 512 --max_num_queries 100 |
| 96 | +``` |
| 97 | + |
| 98 | +Use the same indexes to evaluate and visualize the retrieval: install [retrievalnet](#training), generate the [Python protobuf interface](notebooks/generate_proto_py.sh), and refer to [tango_evaluation.ipynb](https://nbviewer.jupyter.org/github/ethz-asl/hierarchical_loc/blob/master/notebooks/tango_evaluation.ipynb) and [tango_visualize_retrieval.ipynb](https://nbviewer.jupyter.org/github/ethz-asl/hierarchical_loc/blob/master/notebooks/tango_visualize_retrieval.ipynb). |
| 99 | + |
| 100 | +## Training |
| 101 | + |
| 102 | +We use distillation to compress the original NetVLAD model into a smaller MobileNetVLAD with mobile real-time inference capability. |
| 103 | +<p align="center"> |
| 104 | + <img src="doc/training_process.png" width="70%"/> |
| 105 | +</p> |
| 106 | + |
| 107 | + |
| 108 | +### Installation |
| 109 | + |
| 110 | +Python 3.5 is required. It is advised to run the following installation commands within a virtual environment. You will be prompted to provide the path to a data folder (subsequently referred as `$DATA_PATH`) containing the datasets and pre-trained models and to an experiment folder (`$EXPER_PATH`) containing the trained models, training logs, and exported descriptors for evaluation. |
| 111 | +``` |
| 112 | +cd retrievalnet && make install |
| 113 | +``` |
| 114 | + |
| 115 | +### Exporting the target descriptors |
| 116 | + |
| 117 | +If you wish to train MobileNetVLAD on the Google Landmarks dataset as done in our paper, you first need to download [the index of images](https://github.com/ethz-asl/hierarchical_loc/releases/download/1.0/google_landmarks_index.csv) and then download the dataset itself with [download_google_landmarks.py](retrievalnet/downloading/download_google_landmarks.py). The [weights of the original NetVLAD model](http://rpg.ifi.uzh.ch/datasets/netvlad/vd16_pitts30k_conv5_3_vlad_preL2_intra_white.zip) are provided by [netvlad_tf_open](https://github.com/uzh-rpg/netvlad_tf_open) and should be extracted in `$DATA_PATH/weights/`. |
| 118 | + |
| 119 | +Finally export the descriptors of Google Landmarks: |
| 120 | +``` |
| 121 | +python export_descriptors.py config/netvlad_export_distill.yaml google_landmarks/descriptors --as_dataset |
| 122 | +``` |
| 123 | + |
| 124 | +### Training MobileNetVLAD |
| 125 | + |
| 126 | +Extract the MobileNet encoder [pre-trained on ImageNet](https://storage.googleapis.com/mobilenet_v2/checkpoints/mobilenet_v2_0.35_224.tgz) in `$DATA_PATH/weights/` and run: |
| 127 | +```bash |
| 128 | +python train.py config/mobilenetvlad_train_distill.yaml mobilenetvlad |
| 129 | +``` |
| 130 | +The training can be interrupted at any time using `Ctrl+C` and can be monitored with Tensorboard summaries saved in `$EXPER_PATH/mobilenetvlad/`. The weights are also saved there. |
| 131 | + |
| 132 | +### Exporting the model for deployment |
| 133 | +```bash |
| 134 | +python export_model.py config/mobilenetvlad_train_distill.yaml mobilenetvlad |
| 135 | +``` |
| 136 | +will export the model in `$EXPER_PATH/saved_models/mobilenetvlad/`. |
| 137 | + |
| 138 | +### Evaluating on the NCLT dataset |
| 139 | + |
| 140 | +Download the [NCLT sequences](http://robots.engin.umich.edu/nclt/) in `$DATA_PATH/nclt/` along with the corresponding [pose files](https://github.com/ethz-asl/hierarchical_loc/releases/download/1.0/nclt_poses.zip) (generated with [nclt_generate_poses.ipynb](notebooks/nclt_generate_poses.ipynb)). Export the NCLT descriptors, e.g. for MobileNetVLAD: |
| 141 | +```bash |
| 142 | +python export_descriptors.py configs/mobilenetvlad_export_nclt.yaml mobilenetvlad |
| 143 | +``` |
| 144 | +These can be used to evaluate and visualize the retrieval (see [nclt_evaluation.ipynb](https://nbviewer.jupyter.org/github/ethz-asl/hierarchical_loc/blob/master/notebooks/nclt_evaluation.ipynb) and [nclt_visualize_retrieval.ipynb](https://nbviewer.jupyter.org/github/ethz-asl/hierarchical_loc/blob/master/notebooks/nclt_visualize_retrieval.ipynb)). |
| 145 | + |
| 146 | +## Citation |
| 147 | +Please consider citing the corresponding publication if you use this work in an academic context: |
| 148 | +``` |
| 149 | +@article{hloc2018, |
| 150 | + title={Leveraging Deep Visual Descriptors for Hierarchical Efficient Localization}, |
| 151 | + author={Sarlin, P.-E. and Debraine, F. and Dymczyk, M. and Siegwart, R. and Cadena, C.}, |
| 152 | + journal={arXiv:-}, |
| 153 | + year={2018} |
| 154 | +} |
| 155 | +``` |
0 commit comments