First install Eigen and pkg-config
apt install libeigen3-dev pkg-config
Then clone this repo, and install the opengv
submodule with git submodule update --init --recursive
. Then install Ouroboros by running
pip install .
You can run the tests with
pytest --ignore=third_party --ignore=extra/ouroboros_ros
Individual components of the VLC pipeline may have extra dependencies. You probably want to install:
LightGlue
(https://github.com/cvg/LightGlue)pytorch
,pytorch-lightning
,pytorch-metric-learning
, andtorchvision
via pip (needed for Salad, probably others)
See examples/vlc_server_driver_example.py
for the best example of how to use
the VLC Server abstraction.
Ouroboros is integrated with ROS in extras/ouroboros_ros
, which is a valid
ROS package. Once you have built ouroboros_ros
in your ROS workspace, you can
run the Ouroboros ROS node with roslaunch ouroboros_ros vlc_server_node.launch
. When using the node with you datasets, you will need
to properly remap the following topics:
~image_in
~camera_info
~depth_in
You will also need to specify a couple of coordinate frames by rosparam:
fixed_frame
-- Frame to consider "fixed" for pose hints (mostly for GT debugging or baseline comparisons)hint_body_frame
-- Frame to consider body for pose hintsbody_frame
-- Robot body frame, the frame that loop closures are computed with respect tocamera_frame
Ouroboros models the VLC pipeline as
Input Image --> Place Recognition --> Keypoint / Descriptor Extraction -->
--> Query-Match Keypoint Association --> Pose Recovery --> Output Looop Closures
The recognition, keypoint/descriptor, association, and pose recovery methods can all be swapped out. Ouroboros uses a plugin-based system that uses the provided configuration to decide which module to load for each of these parts of the pipeline. These plugins are also auto-discovered, meaning that it is possible to extend Ouroboros without needs to add any new code to the Ouroboros repo itself.
We will use the Salad place recognition module as an example of how to
implement a custom module. It is implemented
here. A plugin should be a class (here
SaladModel
) that takes a configuration struct as an argument in the
constructor. It must implement a function infer
, which will be called at the
approriate part of the VLC pipeline. For examples of the specific interface
that infer
must have for each part of the pipeline, check refer to the
"ground truth" example modules here. A plugin should also
have a load
method to create an instant of the class from a configuration.
Next, we need to define a configuration (here SaladModelConfig
) which
inherits from ouroboros.config.Config
. This should be a dataclass with any
configuration information that you would like to set from a file. It needs to
use the register_config
decorator to declare that it is a plugin of type
place_model
, with name Salad
, and can be constructed into a class with
constructor SaladModel
. The resulting plugin can be loaded from a config file
such as vlc_driver_config.yaml (see
VlcDriverConfig.load
in
vlc_server_driver_example.py).
Plugins are automatically discovered in any top-level imports from packages
that start with ouroboros_
, so you can use plugins with Ouroboros even
without adding them to the Ouroboros codebase.
Note that it is possible to have recursive configurations, see e.g.
VlcDriverConfig
in
vlc_server_driver_example.py.
The ROS node implementations are the highest-level and most "out-of-the-box" solutions that we provide, but they are pretty thin wrappers around the VlcServer abstraction. There is also a lower-level VLC Database abstraction that handles the storage and querying of images, embeddings, loop closures, etc. Currently this database abstraction is implemented in a very naive way, but the implementation should be able to be improved while maintaining exactly the same interface.
The VlcServer encapsulates the Visual Loop Closure pipeline. First, a sensor needs to be registered with the VlcServer with
session_id = vlc_server.register_camera(robot_id, camera_config, epoch_ns)
The session_id
is a unique identifier for a camera in the current session.
When new frames should be checked and stored for VLC, you can call
add_and_query_frame
:
loop_closure = vlc_server.add_and_query_frame(session_id, image, epoch_ns)
For basic use-cases, this is the only function you should need. However, the VlcServer also provides slightly lower-level functions for more advanced functionality, such as storing image embeddings without the actual image which is a common need in distributed VLC systems.
VlcDb
provides an abstraction for storing and querying data related to
sensors, images, and loop closures. Philosophically, VlcDb
represents a
relational database, although it is currently not implemented with a "real"
database backend. There are four tables:
- Image Table
- Loop Closure Table
- Session Table
- Camera Table
The image table stores the information related to a single image frame. This includes the RGB(D) image, and any byproducts like embedding vector, keypoints, descriptors, etc. The Image Table is the most relevant place for making improvements -- we need to execute vector queries based on the embedding vectors in this table, and serializing these images to disk is not handled yet.
add_image
and get_image
are the most relevant interfaces.
The loop closure table stores the history of loop closures that the robot has detected. This is pretty straightforward and not extensively used right now.
get_lc
and iterate_lcs
are useful if you want to reason over loop closures found so far.
This tracks each session, which is specific to a camera and continuous period of operation.
See add_session
.
This tracks each camera, for example intrinsics and which robot the camera is associated with.
See add_camera
.
The VlcDb provides an interface to several kinds of queries. The primary purpose of the query is to find images matching the supplied image embedding vector. However, there are certain filters that we might need to apply to these queries. For example, often we want the closest match that was observed more than N seconds ago. There are also cases where we would like to support batch queries (e.g. in multisession context) to increase speed.
The most VLC-specific query is probably query_embeddings_max_time
, which
queries for the single closest embedding with time at most T. This is a very
common query when running an online VLC module. There is also
query_embeddings
to find the closest embedding across any time. There are
several other query functions, which allow for batch queries or filtering based
on custom predicate function. Note that the batch queries might be faster
than the single versions, although it depends on when we get around to
optimizing the backend. The filter functions with a custom callable should be
expected to be quite a bit slower than versions that don't use a custom
callable.