-
Notifications
You must be signed in to change notification settings - Fork 29.2k
Add Fast Image Processor for Idefics3 #37045
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add Fast Image Processor for Idefics3 #37045
Conversation
Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @rootonchair , thanks for contributing this, overall looks really good! There is an ongoing discussion on what to do with Lanczos resampling, but in the meantime I'm curious what diff you get with Bicubic when testing against the slow processor (using Lanczos) in test_slow_fast_equivalence
and test_slow_fast_equivalence
?
@@ -296,7 +296,7 @@ def __init__( | |||
do_convert_rgb: bool = True, | |||
do_resize: bool = True, | |||
size: Dict[str, int] = None, | |||
resample: PILImageResampling = PILImageResampling.LANCZOS, | |||
resample: PILImageResampling = PILImageResampling.BICUBIC, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't change the default resampling method in the slow processor. I hadn't seen a processor with Lanczos as default before, and I realize that it is not supported in torch/torchvision. I'd say the best course of action for now is to keep using Lanczos for slow image processing and use Bicubic for fast and add-in a warning_once
in the fast processor to fall back to slow for exact processing. You can follow the ongoing discussion on this here: #35206 (comment)
src/transformers/models/idefics3/image_processing_idefics3_fast.py
Outdated
Show resolved
Hide resolved
@yonigozlan thanks for your thorough reviews. So I did a benchmark that measures MAE between LANCZOS and other resamplers. As you can see, BICUBIC is the most suitable approximation among others
|
Very cool, thanks for that. Looks like Bicubic is the way to go indeed. It's not ideal to have non negligeable differences in the processing, but I guess it's the best we can do for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating! Only thing left is adding a warning_once before forcing Bicubic resample, then LGTM!
Edit: the tests slow_fast_equivalence don't seem to pass on the CI. We might need a higher threshold for this specific case
Edit2: we're also missing adding Idefics3ImageProcessorFast
to docs/source/en/model_doc/idefics3.md
, look slike something went wrong with the transformers-cli script
def _assertEquivalence(self, a, b): | ||
self.assertTrue(torch.allclose(a, b, atol=1e-1)) | ||
self.assertLessEqual( | ||
torch.mean(torch.abs(a - b)).item(), 1e-3 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice 🤗
# For an example of a fast image processor requiring more complex augmentations, see `LlavaNextImageProcessorFast`. | ||
|
||
# Default values should be checked against the slow image processor | ||
# None values left after checking can be removed | ||
resample = PILImageResampling.BICUBIC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep Lanczos here, and override the resize function to force Bicubic if resample is Lanczos, with a warning_once so users are aware of the issue. See my comment here: #37140 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added the warning to resize method. If all good, I will forward the same changes for Flava PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @rootonchair ! I'm currently working on a refactor of the processing in idefics2/idefics3/smolvlm family which will impact this PR, here's the PR in question: #37291
Basically it will change the processing of batched images to fully flattened images, so it should simplify this PR. It looks like you're already handling image processing this way though, so maybe you won't have much to change.
In any case, I advise waiting for this refactor to be merged before iterating on this PR, I'll ping here when it's done!
What does this PR do?
Related #36978
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.