ML. Send smaller IIIF Images to the ML learning endpoints + don't index empties/no detections #102

DiegoPino · 2025-01-28T21:33:03Z

What?

Now using real data/large collections, sending a full/full image to the back end is just too process intensive. The reason we were sending full/full was because smaller images than the model needs (e.g Insightface needs 640x640) could be upscaled by Python and still get a good enough vector, so I could be lazy and not actually check for the size at the processor level. But .. I should not be lazy anymore. Some portrait images we processed are of 6Million pixels, and will just move gigabytes of data between backend and front end for a 640 representation.

Also. New. Each Post processor (adding this to the interface) will have two extra methods: validateForIndex and validateForChaining. The base implementation can be just a return TRUE. But some Processors should return FALSE, if, e.g the output is not what we need. ML models with empty vectors (and thus empty OCR) should not fill the Solr index with nothing.

This would also allow OCR that leads to 0 to have no index entry (e.g failed OCR)

So what now

Check the original size.
Depending on the Model, send a larger than needed size (e.g for image segmentation) so we don't loose details (like the person standing on the back)
Others send just a few extra %.
For smaller than desired for the model, add a checkbox allowing people to "skip" ADOs that don't provide the best data.

@alliomeria for your radar

The text was updated successfully, but these errors were encountered:

DiegoPino self-assigned this Jan 28, 2025

DiegoPino added enhancement New feature or request Solr Indexing Putting things where they can be found Post processor Plugins The ones with a ->run() method ML labels Jan 28, 2025

DiegoPino added this to the 0.9.0 milestone Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ML. Send smaller IIIF Images to the ML learning endpoints + don't index empties/no detections #102

ML. Send smaller IIIF Images to the ML learning endpoints + don't index empties/no detections #102

DiegoPino commented Jan 28, 2025 •

edited

Loading

ML. Send smaller IIIF Images to the ML learning endpoints + don't index empties/no detections #102

ML. Send smaller IIIF Images to the ML learning endpoints + don't index empties/no detections #102

Comments

DiegoPino commented Jan 28, 2025 • edited Loading

What?

So what now

DiegoPino commented Jan 28, 2025 •

edited

Loading