ML. Send smaller IIIF Images to the ML learning endpoints + don't index empties/no detections #102
Labels
enhancement
New feature or request
ML
Post processor Plugins
The ones with a ->run() method
Solr Indexing
Putting things where they can be found
Milestone
What?
Now using real data/large collections, sending a
full/full
image to the back end is just too process intensive. The reason we were sendingfull/full
was because smaller images than the model needs (e.g Insightface needs 640x640) could be upscaled by Python and still get a good enough vector, so I could be lazy and not actually check for the size at the processor level. But .. I should not be lazy anymore. Some portrait images we processed are of 6Million pixels, and will just move gigabytes of data between backend and front end for a 640 representation.Also. New. Each Post processor (adding this to the interface) will have two extra methods: validateForIndex and validateForChaining. The base implementation can be just a return TRUE. But some Processors should return FALSE, if, e.g the output is not what we need. ML models with empty vectors (and thus empty OCR) should not fill the Solr index with nothing.
This would also allow OCR that leads to 0 to have no index entry (e.g failed OCR)
So what now
@alliomeria for your radar
The text was updated successfully, but these errors were encountered: