Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ML. Send smaller IIIF Images to the ML learning endpoints + don't index empties/no detections #102

Open
DiegoPino opened this issue Jan 28, 2025 · 0 comments
Assignees
Labels
enhancement New feature or request ML Post processor Plugins The ones with a ->run() method Solr Indexing Putting things where they can be found
Milestone

Comments

@DiegoPino
Copy link
Member

DiegoPino commented Jan 28, 2025

What?

Now using real data/large collections, sending a full/full image to the back end is just too process intensive. The reason we were sending full/full was because smaller images than the model needs (e.g Insightface needs 640x640) could be upscaled by Python and still get a good enough vector, so I could be lazy and not actually check for the size at the processor level. But .. I should not be lazy anymore. Some portrait images we processed are of 6Million pixels, and will just move gigabytes of data between backend and front end for a 640 representation.

Also. New. Each Post processor (adding this to the interface) will have two extra methods: validateForIndex and validateForChaining. The base implementation can be just a return TRUE. But some Processors should return FALSE, if, e.g the output is not what we need. ML models with empty vectors (and thus empty OCR) should not fill the Solr index with nothing.

This would also allow OCR that leads to 0 to have no index entry (e.g failed OCR)

So what now

  • Check the original size.
  • Depending on the Model, send a larger than needed size (e.g for image segmentation) so we don't loose details (like the person standing on the back)
  • Others send just a few extra %.
  • For smaller than desired for the model, add a checkbox allowing people to "skip" ADOs that don't provide the best data.

@alliomeria for your radar

@DiegoPino DiegoPino self-assigned this Jan 28, 2025
@DiegoPino DiegoPino added enhancement New feature or request Solr Indexing Putting things where they can be found Post processor Plugins The ones with a ->run() method ML labels Jan 28, 2025
@DiegoPino DiegoPino added this to the 0.9.0 milestone Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request ML Post processor Plugins The ones with a ->run() method Solr Indexing Putting things where they can be found
Projects
None yet
Development

No branches or pull requests

1 participant