Image indexing #65

metazool · 2025-01-28T13:15:54Z

We tried an intake catalog at one point, and did not get much benefit from it - #3

By switching from chromadb (which is itself sqlite-based) to sqlite-vec we gained a relational database with minimal metadata, used for search purposes #44

At one stage there was an intention to add indexing during the image processing pipeline. There isn't a lot of benefit to doing that, it adds moving parts, slows things down if it involves one or more deep learning models. We also can't guarantee that it will always run in the same place, with the same code, and want the index to reflect what's in storage, not the processes that put it there.

There's a nice-to-have reality in which each image goes into a message queue or broker as it gets uploaded to s3 storage, and different indexers can pick it up and annotate it, but we're nowhere near that at the moment.

We do (should?) have spatio-temporal EXIF headers corresponding to sample location, which is helpful!

Fix the sqlite-vec database based on what we learned from the fdri_phenocam prototype (the virtual table is really only for an index of the embeddings, and needs a regular table for other metadata)
Extend or repair the current DVC pipeline to index everything in each bucket, and do so
Packaged way adding a detritus classifier to image processing #32 - create a clustering model based on image embeddings for each collection, and package that up as a Label Studio ML backend
Suggest some recommendations for handling larger volume - at what point will the number of image objects in storage (hundreds of thousands? millions?) start to become painful to work with? Review of s3-specific advice. This wants to be its own issue

The text was updated successfully, but these errors were encountered:

metazool added this to Plankton data pipelines Jan 28, 2025

metazool moved this to In Progress in Plankton data pipelines Jan 28, 2025

metazool mentioned this issue Jan 29, 2025

Label Studio ML backend #64

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Image indexing #65

Image indexing #65

metazool commented Jan 28, 2025 •

edited

Loading

Image indexing #65

Image indexing #65

Comments

metazool commented Jan 28, 2025 • edited Loading

metazool commented Jan 28, 2025 •

edited

Loading