Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update README #25

Merged
merged 2 commits into from
Feb 25, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
233 changes: 174 additions & 59 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,67 +2,182 @@

## Installation

You will need to clone `opengv` as a submodule via `git submodule update --init
--recursive`. Then run `pip install .` from the top level of this repo
First install Eigen and pkg-config
```bash
apt install libeigen3-dev pkg-config
```

Then clone this repo, and install the `opengv` submodule with `git submodule
update --init --recursive`. Then install Ouroboros by running

```bash
pip install .
```

You can run the tests with
```
pytest --ignore=third_party --ignore=extra/ouroboros_ros
```

Individual components of the VLC pipeline may have extra dependencies. You
probably want to install:

1. `LightGlue` (https://github.com/cvg/LightGlue)
2. `pytorch`, `pytorch-lightning`, `pytorch-metric-learning`, and `torchvision` via pip (needed for Salad, probably others)


See `examples/vlc_server_driver_example.py` for the best example of how to use
the VLC Server abstraction.

## ROS Integration

Ouroboros is integrated with ROS in `extras/ouroboros_ros`, which is a valid
ROS package. Once you have built `ouroboros_ros` in your ROS workspace, you can
run the Ouroboros ROS node with `roslaunch ouroboros_ros
vlc_server_node.launch`. When using the node with you datasets, you will need
to properly remap the following topics:
* `~image_in`
* `~camera_info`
* `~depth_in`

You will also need to specify a couple of coordinate frames by rosparam:
* `fixed_frame` -- Frame to consider "fixed" for pose hints (mostly for GT debugging or baseline comparisons)
* `hint_body_frame` -- Frame to consider body for pose hints
* `body_frame` -- Robot body frame, the frame that loop closures are computed with respect to
* `camera_frame`

## Plugins and Configuration

Ouroboros models the VLC pipeline as
```
Input Image --> Place Recognition --> Keypoint / Descriptor Extraction -->
--> Query-Match Keypoint Association --> Pose Recovery --> Output Looop Closures
```

The recognition, keypoint/descriptor, association, and pose recovery methods
can all be swapped out. Ouroboros uses a plugin-based system that uses the
provided configuration to decide which module to load for each of these parts
of the pipeline. These plugins are also auto-discovered, meaning that it is
possible to extend Ouroboros without needs to add any new code to the Ouroboros
repo itself.

See the example in `examples` to get an idea of how to use the interface so far.
We will use the Salad place recognition module as an example of how to
implement a custom module. It is implemented
[here](src/ouroboros_salad/salad_model.py). A plugin should be a class (here
`SaladModel`) that takes a configuration struct as an argument in the
constructor. It must implement a function `infer`, which will be called at the
approriate part of the VLC pipeline. For examples of the specific interface
that `infer` must have for each part of the pipeline, check refer to the
"ground truth" example modules [here](src/ouroboros_gt). A plugin should also
have a `load` method to create an instant of the class from a configuration.

## Example ROS Integration
Next, we need to define a configuration (here `SaladModelConfig`) which
inherits from `ouroboros.config.Config`. This should be a dataclass with any
configuration information that you would like to set from a file. It needs to
use the `register_config` decorator to declare that it is a plugin of type
`place_model`, with name `Salad`, and can be constructed into a class with
constructor `SaladModel`. The resulting plugin can be loaded from a config file
such as [vlc\_driver\_config.yaml](examples/config/vlc_driver_config.yaml) (see
`VlcDriverConfig.load` in
[vlc\_server\_driver\_example.py](examples/vlc_server_driver_example.py)).
Plugins are automatically discovered in any top-level imports from packages
that start with `ouroboros_`, so you can use plugins with Ouroboros even
without adding them to the Ouroboros codebase.

Note that it is possible to have recursive configurations, see e.g.
`VlcDriverConfig` in
[vlc\_server\_driver\_example.py](examples/vlc_server_driver_example.py).

# Library Building Blocks

The ROS node implementations are the highest-level and most "out-of-the-box"
solutions that we provide, but they are pretty thin wrappers around the
VlcServer abstraction. There is also a lower-level VLC Database abstraction
that handles the storage and querying of images, embeddings, loop closures,
etc. Currently this database abstraction is implemented in a very naive way,
but the implementation should be able to be improved while maintaining exactly
the same interface.


## VlcServer

The VlcServer encapsulates the Visual Loop Closure pipeline. First, a sensor
needs to be registered with the VlcServer with

```python
session_id = vlc_server.register_camera(robot_id, camera_config, epoch_ns)
```

The `session_id` is a unique identifier for a camera in the current session.
When new frames should be checked and stored for VLC, you can call
`add_and_query_frame`:

```python
loop_closure = vlc_server.add_and_query_frame(session_id, image, epoch_ns)
```

For basic use-cases, this is the only function you should need. However, the
VlcServer also provides slightly lower-level functions for more advanced
functionality, such as storing image embeddings without the actual image which
is a common need in distributed VLC systems.


## VlcDb

`VlcDb` provides an abstraction for storing and querying data related to
sensors, images, and loop closures. Philosophically, `VlcDb` represents a
relational database, although it is currently not implemented with a "real"
database backend. There are four tables:
* Image Table
* Loop Closure Table
* Session Table
* Camera Table

### Image Table

The image table stores the information related to a single image frame. This
includes the RGB(D) image, and any byproducts like embedding vector, keypoints,
descriptors, etc. The Image Table is the most relevant place for making
improvements -- we need to execute vector queries based on the embedding
vectors in this table, and serializing these images to disk is not handled yet.

`add_image` and `get_image` are the most relevant interfaces.

### Loop Closure Table

The loop closure table stores the history of loop closures that the robot has
detected. This is pretty straightforward and not extensively used right now.

`get_lc` and `iterate_lcs` are useful if you want to reason over loop closures found so far.

### Session Table

This tracks each session, which is specific to a camera and continuous period of operation.

See `add_session`.

### Camera Table

This tracks each camera, for example intrinsics and which robot the camera is associated with.

`ouroboros_ros` is a valid ROS package that has an example of a node that generates "ground truth" loop closures. It queries the tf tree for the specified tf, and occasionally publishes a loop closure message where appropriate. You can try this example by building `ouroboros_ros` in your ROS workspace, running `roslaunch ouroboros_ros ground_truth_lc_node.launch`, and then playing a rosbag with the appropriate tfs. See `ouroboros_ros/config/ground_truth_lc_node.yaml` for configuration information.
See `add_camera`.

### Queries

## Interface Notes
The VlcDb provides an interface to several kinds of queries. The primary
purpose of the query is to find images matching the supplied image embedding
vector. However, there are certain filters that we might need to apply to these
queries. For example, often we want the closest match that was observed more
than N seconds ago. There are also cases where we would like to support batch
queries (e.g. in multisession context) to increase speed.

The central feature of the library is the `VlcDb` database for visual loop closures (see `vlc_db.py`). This class provides an interface for storing, updating, and querying keyframe images, embeddings, keypoints, descriptors, and loop closures. The "VLC Server" will be built on top of this. Currently the interface to this database is:

### Images

`add_image` and `get_image` add a (keyframe) image to the database with associated metadata.When you add an image, you get back a UUID that *uniquely* identifies that image. `drop_image` deletes an image from the store.

`update_embedding`, `update_keypoints`, and `update_descriptors` update the place embedding, image keypoints, and feature descriptors for a stored image, given the image uuid and embedding/keypoints/descriptors.

`query_embeddings` takes an array of embedding vectors and returns the top-k closest matches and distances for each

Images are inserted into the database with `SparkImage` datatype (in `spark_image.py`) and returned with the `VlcImage` datatype (in `vlc_image.py`).

### Loop Closures

`add_lc` and `get_lc` add or remove a loop closure from the database. I think this functionality is most likely to be useful for multi session / multi robot.

Loop closures are inserted and returned with the `SparkLoopClosure` datatype (in `spark_loop_closure.py`)

### Sessions

A session is a single, continuous stream of images from a camera. Usually this maps to the normal notion of a session being one robot running for a certain amount of time. But for a robot with multiple cameras, each camera is a session. A centralized mapping system which consumes several distributed camera streams would have one session for each camera.

We provide `add_session` to create a new session or `insert_session` to insert a session generated by an external state manager.

Sessions are returned with the `SparkSession` type (in `vlc_session_table.py`).

## Devlopment Notes


Terms / Design choices to expand on:

* "Database" architecture. Philosophically, the current design is a database with three tables: Images, LoopClosures, and Sessions. Since many of the things we want to store are very large chunks of data and we don't necessarily have complicated queries, we don't actually use a database and instead roll our own system for manager storage and querying. If we ever start doing more complex queries in the future, we should switch to sqlite.

VlcImageTable -- ...
LcTable -- ...
SessionTable -- ...

When you get an element from the table you get back a class that represents a "row" in the database (e.g. `VlcImage`)


* SparkImage -- wrapper for all pixel-level image data (e.g. r,g,b,d), but NOT METADATA
* VlcImage -- a "row" of image data that we store


## Dev questions:

* Naming of SparkImage / SparkLoopClosure (currently "rows" in the database, but in general it defines the interface for the data that is put into / queried from the database (even if the underlying storage were different)
* Anything else to store for SparkLoopClosure?

## TODO:

* Make a clear interface to (optionally) stop storing the image once we have generated the embedding and keypoints etc.
* There are many possible combinations of what info we keep, so probably don't want the database itself to support all of these. Let that be the job of a higher level of abstraction. But the database probably should support dropping the original image once we have the embedding etc.
The most VLC-specific query is probably `query_embeddings_max_time`, which
queries for the single closest embedding with time at most T. This is a very
common query when running an online VLC module. There is also
`query_embeddings` to find the closest embedding across any time. There are
several other query functions, which allow for batch queries or filtering based
on custom predicate function. Note that the batch queries *might* be faster
than the single versions, although it depends on when we get around to
optimizing the backend. The filter functions with a custom callable should be
expected to be quite a bit slower than versions that don't use a custom
callable.
23 changes: 13 additions & 10 deletions extra/ouroboros_ros/config/gt_vlc_server_config.yaml
Original file line number Diff line number Diff line change
@@ -1,17 +1,20 @@
descriptor_method:
type: ground_truth
display_keypoint_matches: false
display_place_matches: true
place_method:
type: ground_truth
lc_distance_threshold: 4
lc_recent_lockout_s: 20
keypoint_method:
type: ground_truth
lc_frame_lockout_s: 10
match_method:
descriptor_method:
type: ground_truth
place_match_threshold: -2
place_method:
match_method:
type: ground_truth
lc_distance_threshold: 4
lc_recent_lockout_s: 10
pose_method:
type: ground_truth
lc_frame_lockout_s: 20
place_match_threshold: -4
strict_keypoint_evaluation: false
display_method:
type: ros
display_place_matches: false
display_keypoint_matches: false
display_inlier_keypoint_matches: false
2 changes: 1 addition & 1 deletion extra/ouroboros_ros/config/vlc_server_node.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ robot_id: 0
config_path: $(find ouroboros_ros)/config/vlc_server_config.yaml

fixed_frame: world
hint_body_frame: ground_truth/husky/base # was camera_frame
hint_body_frame: ground_truth/husky/base

body_frame: husky/base
camera_frame: husky/camera