MIT-SPARK · GoldenZephyr · Feb 25, 2025 · Feb 24, 2025 · Feb 24, 2025
diff --git a/README.md b/README.md
@@ -2,67 +2,182 @@
 
 ## Installation
 
-You will need to clone `opengv` as a submodule via `git submodule update --init
---recursive`. Then run `pip install .` from the top level of this repo
+First install Eigen and pkg-config
+```bash
+apt install libeigen3-dev pkg-config
+```
+
+Then clone this repo, and install the `opengv` submodule with  `git submodule
+update --init --recursive`.  Then install Ouroboros by running
+
+```bash
+pip install .
+```
+
+You can run the tests with
+```
+pytest --ignore=third_party --ignore=extra/ouroboros_ros
+```
+
+Individual components of the VLC pipeline may have extra dependencies. You
+probably want to install:
+
+1. `LightGlue` (https://github.com/cvg/LightGlue)
+2. `pytorch`, `pytorch-lightning`, `pytorch-metric-learning`, and `torchvision` via pip (needed for Salad, probably others)
+
+
+See `examples/vlc_server_driver_example.py` for the best example of how to use
+the VLC Server abstraction.
+
+## ROS Integration
+
+Ouroboros is integrated with ROS in `extras/ouroboros_ros`, which is a valid
+ROS package. Once you have built `ouroboros_ros` in your ROS workspace, you can
+run the Ouroboros ROS node with `roslaunch ouroboros_ros
+vlc_server_node.launch`. When using the node with you datasets, you will need
+to properly remap the following topics:
+* `~image_in`
+* `~camera_info`
+* `~depth_in`
+
+You will also need to specify a couple of coordinate frames by rosparam:
+* `fixed_frame` -- Frame to consider "fixed" for pose hints (mostly for GT debugging or baseline comparisons)
+* `hint_body_frame` -- Frame to consider body for pose hints
+* `body_frame` -- Robot body frame, the frame that loop closures are computed with respect to
+* `camera_frame`
+
+## Plugins and Configuration
+
+Ouroboros models the VLC pipeline as
+```
+Input Image --> Place Recognition --> Keypoint / Descriptor Extraction -->
+    --> Query-Match Keypoint Association --> Pose Recovery --> Output Looop Closures
+```
+
+The recognition, keypoint/descriptor, association, and pose recovery methods
+can all be swapped out. Ouroboros uses a plugin-based system that uses the
+provided configuration to decide which module to load for each of these parts
+of the pipeline. These plugins are also auto-discovered, meaning that it is
+possible to extend Ouroboros without needs to add any new code to the Ouroboros
+repo itself.
 
-See the example in `examples` to get an idea of how to use the interface so far.
+We will use the Salad place recognition module as an example of how to
+implement a custom module. It is implemented
+[here](src/ouroboros_salad/salad_model.py). A plugin should be a class (here
+`SaladModel`) that takes a configuration struct as an argument in the
+constructor. It must implement a function `infer`, which will be called at the
+approriate part of the VLC pipeline. For examples of the specific interface
+that `infer` must have for each part of the pipeline, check refer to the
+"ground truth" example modules [here](src/ouroboros_gt). A plugin should also
+have a `load` method to create an instant of the class from a configuration.
 
-## Example ROS Integration
+Next, we need to define a configuration (here `SaladModelConfig`) which
+inherits from `ouroboros.config.Config`. This should be a dataclass with any
+configuration information that you would like to set from a file. It needs to
+use the `register_config` decorator to declare that it is a plugin of type
+`place_model`, with name `Salad`, and can be constructed into a class with
+constructor `SaladModel`. The resulting plugin can be loaded from a config file
+such as [vlc\_driver\_config.yaml](examples/config/vlc_driver_config.yaml) (see
+`VlcDriverConfig.load` in
+[vlc\_server\_driver\_example.py](examples/vlc_server_driver_example.py)).
+Plugins are automatically discovered in any top-level imports from packages
+that start with `ouroboros_`, so you can use plugins with Ouroboros even
+without adding them to the Ouroboros codebase.
+
+Note that it is possible to have recursive configurations, see e.g.
+`VlcDriverConfig` in
+[vlc\_server\_driver\_example.py](examples/vlc_server_driver_example.py).
+
+# Library Building Blocks
+
+The ROS node implementations are the highest-level and most "out-of-the-box"
+solutions that we provide, but they are pretty thin wrappers around the
+VlcServer abstraction. There is also a lower-level VLC Database abstraction
+that handles the storage and querying of images, embeddings, loop closures,
+etc. Currently this database abstraction is implemented in a very naive way,
+but the implementation should be able to be improved while maintaining exactly
+the same interface.
+
+
+## VlcServer
+
+The VlcServer encapsulates the Visual Loop Closure pipeline.  First, a sensor
+needs to be registered with the VlcServer with
+
+```python
+session_id = vlc_server.register_camera(robot_id, camera_config, epoch_ns)
+```
+
+The `session_id` is a unique identifier for a camera in the current session.
+When new frames should be checked and stored for VLC, you can call
+`add_and_query_frame`:
+
+```python
+loop_closure = vlc_server.add_and_query_frame(session_id, image, epoch_ns)
+```
+
+For basic use-cases, this is the only function you should need. However, the
+VlcServer also provides slightly lower-level functions for more advanced
+functionality, such as storing image embeddings without the actual image which
+is a common need in distributed VLC systems.
+
+
+## VlcDb
+
+`VlcDb` provides an abstraction for storing and querying data related to
+sensors, images, and loop closures. Philosophically, `VlcDb` represents a
+relational database, although it is currently not implemented with a "real"
+database backend. There are four tables:
+* Image Table
+* Loop Closure Table
+* Session Table
+* Camera Table
+
+### Image Table
+
+The image table stores the information related to a single image frame. This
+includes the RGB(D) image, and any byproducts like embedding vector, keypoints,
+descriptors, etc. The Image Table is the most relevant place for making
+improvements -- we need to execute vector queries based on the embedding
+vectors in this table, and serializing these images to disk is not handled yet.
+
+`add_image` and `get_image` are the most relevant interfaces.
+
+### Loop Closure Table
+
+The loop closure table stores the history of loop closures that the robot has
+detected. This is pretty straightforward and not extensively used right now.
+
+`get_lc` and `iterate_lcs` are useful if you want to reason over loop closures found so far.
+
+### Session Table
+
+This tracks each session, which is specific to a camera and continuous period of operation.
+
+See `add_session`.
+
+### Camera Table
+
+This tracks each camera, for example intrinsics and which robot the camera is associated with.
 
-`ouroboros_ros` is a valid ROS package that has an example of a node that generates "ground truth" loop closures. It queries the tf tree for the specified tf, and occasionally publishes a loop closure message where appropriate. You can try this example by building `ouroboros_ros` in your ROS workspace, running `roslaunch ouroboros_ros ground_truth_lc_node.launch`, and then playing a rosbag with the appropriate tfs. See `ouroboros_ros/config/ground_truth_lc_node.yaml` for configuration information.
+See `add_camera`.
+
+### Queries
 
-## Interface Notes
+The VlcDb provides an interface to several kinds of queries. The primary
+purpose of the query is to find images matching the supplied image embedding
+vector. However, there are certain filters that we might need to apply to these
+queries. For example, often we want the closest match that was observed more
+than N seconds ago. There are also cases where we would like to support batch
+queries (e.g. in multisession context) to increase speed.
 
-The central feature of the library is the `VlcDb` database for visual loop closures (see `vlc_db.py`). This class provides an interface for storing, updating, and querying keyframe images, embeddings, keypoints, descriptors, and loop closures. The "VLC Server" will be built on top of this. Currently the interface to this database is:
-
-### Images
-
-`add_image` and `get_image` add a (keyframe) image to the database with associated metadata.When you add an image, you get back a UUID that *uniquely* identifies that image. `drop_image` deletes an image from the store.
-
-`update_embedding`, `update_keypoints`, and `update_descriptors` update the place embedding, image keypoints, and feature descriptors for a stored image, given the image uuid and embedding/keypoints/descriptors.
-
-`query_embeddings` takes an array of embedding vectors and returns the top-k closest matches and distances for each
-
-Images are inserted into the database with `SparkImage` datatype (in `spark_image.py`) and returned with the `VlcImage` datatype (in `vlc_image.py`).
-
-### Loop Closures
-
-`add_lc` and `get_lc` add or remove a loop closure from the database. I think this functionality is most likely to be useful for multi session / multi robot.
-
-Loop closures are inserted and returned with the `SparkLoopClosure` datatype (in `spark_loop_closure.py`)
-
-### Sessions
-
-A session is a single, continuous stream of images from a camera. Usually this maps to the normal notion of a session being one robot running for a certain amount of time. But for a robot with multiple cameras, each camera is a session. A centralized mapping system which consumes several distributed camera streams would have one session for each camera.
-
-We provide `add_session` to create a new session or `insert_session` to insert a session generated by an external state manager.
-
-Sessions are returned with the `SparkSession` type (in `vlc_session_table.py`).
-
-## Devlopment Notes
-
-
-Terms / Design choices to expand on:
-
-* "Database" architecture. Philosophically, the current design is a database with three tables: Images, LoopClosures, and Sessions. Since many of the things we want to store are very large chunks of data and we don't necessarily have complicated queries, we don't actually use a database and instead roll our own system for manager storage and querying. If we ever start doing more complex queries in the future, we should switch to sqlite.
-
-VlcImageTable -- ...
-LcTable -- ...
-SessionTable -- ...
-
-When you get an element from the table you get back a class that represents a "row" in the database (e.g. `VlcImage`)
-
-
-* SparkImage -- wrapper for all pixel-level image data (e.g. r,g,b,d), but NOT METADATA
-* VlcImage -- a "row" of image data that we store
-
-
-## Dev questions:
-
-* Naming of SparkImage / SparkLoopClosure (currently "rows" in the database, but in general it defines the interface for the data that is put into / queried from the database (even if the underlying storage were different)
-* Anything else to store for SparkLoopClosure?
-
-## TODO:
-
-* Make a clear interface to (optionally) stop storing the image once we have generated the embedding and keypoints etc.
-    * There are many possible combinations of what info we keep, so probably don't want the database itself to support all of these. Let that be the job of a higher level of abstraction. But the database probably should support dropping the original image once we have the embedding etc.
+The most VLC-specific query is probably `query_embeddings_max_time`, which
+queries for the single closest embedding with time at most T. This is a very
+common query when running an online VLC module. There is also
+`query_embeddings` to find the closest embedding across any time. There are
+several other query functions, which allow for batch queries or filtering based
+on custom predicate function. Note that the batch queries *might* be faster
+than the single versions, although it depends on when we get around to
+optimizing the backend. The filter functions with a custom callable should be
+expected to be quite a bit slower than versions that don't use a custom
+callable.
diff --git a/extra/ouroboros_ros/config/gt_vlc_server_config.yaml b/extra/ouroboros_ros/config/gt_vlc_server_config.yaml
@@ -1,17 +1,20 @@
-descriptor_method:
-    type: ground_truth
-display_keypoint_matches: false
-display_place_matches: true
+place_method:
+  type: ground_truth
+  lc_distance_threshold: 4
+  lc_recent_lockout_s: 20
 keypoint_method:
   type: ground_truth
-lc_frame_lockout_s: 10
-match_method:
+descriptor_method:
   type: ground_truth
-place_match_threshold: -2
-place_method:
+match_method:
   type: ground_truth
-  lc_distance_threshold: 4
-  lc_recent_lockout_s: 10
 pose_method:
   type: ground_truth
+lc_frame_lockout_s: 20
+place_match_threshold: -4
 strict_keypoint_evaluation: false
+display_method:
+  type: ros
+  display_place_matches: false
+  display_keypoint_matches: false
+  display_inlier_keypoint_matches: false
diff --git a/extra/ouroboros_ros/config/vlc_server_node.yaml b/extra/ouroboros_ros/config/vlc_server_node.yaml
@@ -5,7 +5,7 @@ robot_id: 0
 config_path: $(find ouroboros_ros)/config/vlc_server_config.yaml
 
 fixed_frame: world
-hint_body_frame: ground_truth/husky/base # was camera_frame
+hint_body_frame: ground_truth/husky/base
 
 body_frame: husky/base
 camera_frame: husky/camera