Skip to content

Add Fast Image Processor for Idefics3 #37045

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

rootonchair
Copy link
Contributor

What does this PR do?

Related #36978

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@github-actions github-actions bot marked this pull request as draft March 27, 2025 16:25
Copy link
Contributor

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

@rootonchair rootonchair marked this pull request as ready for review March 27, 2025 16:32
@github-actions github-actions bot requested review from ydshieh and yonigozlan March 27, 2025 16:33
Copy link
Member

@yonigozlan yonigozlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rootonchair , thanks for contributing this, overall looks really good! There is an ongoing discussion on what to do with Lanczos resampling, but in the meantime I'm curious what diff you get with Bicubic when testing against the slow processor (using Lanczos) in test_slow_fast_equivalence and test_slow_fast_equivalence?

@@ -296,7 +296,7 @@ def __init__(
do_convert_rgb: bool = True,
do_resize: bool = True,
size: Dict[str, int] = None,
resample: PILImageResampling = PILImageResampling.LANCZOS,
resample: PILImageResampling = PILImageResampling.BICUBIC,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't change the default resampling method in the slow processor. I hadn't seen a processor with Lanczos as default before, and I realize that it is not supported in torch/torchvision. I'd say the best course of action for now is to keep using Lanczos for slow image processing and use Bicubic for fast and add-in a warning_once in the fast processor to fall back to slow for exact processing. You can follow the ongoing discussion on this here: #35206 (comment)

@rootonchair
Copy link
Contributor Author

rootonchair commented Apr 1, 2025

Hi @rootonchair , thanks for contributing this, overall looks really good! There is an ongoing discussion on what to do with Lanczos resampling, but in the meantime I'm curious what diff you get with Bicubic when testing against the slow processor (using Lanczos) in test_slow_fast_equivalence and test_slow_fast_equivalence?

@yonigozlan thanks for your thorough reviews. So I did a benchmark that measures MAE between LANCZOS and other resamplers. As you can see, BICUBIC is the most suitable approximation among others

Fast resampler test_slow_fast_equivalence test_slow_fast_equivalence_batched
BICUBIC 0.018636604771018028 0.05995668098330498
BILINEAR 0.05067451298236847 0.13565397262573242
NEAREST 0.2290235459804535 0.5087724924087524

@yonigozlan
Copy link
Member

@yonigozlan thanks for your thorough reviews. So I did a benchmark that measures MAE between LANCZOS and other resamplers. As you can see, BICUBIC is the most suitable approximation among others

Very cool, thanks for that. Looks like Bicubic is the way to go indeed. It's not ideal to have non negligeable differences in the processing, but I guess it's the best we can do for now.

Copy link
Member

@yonigozlan yonigozlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating! Only thing left is adding a warning_once before forcing Bicubic resample, then LGTM!

Edit: the tests slow_fast_equivalence don't seem to pass on the CI. We might need a higher threshold for this specific case

Edit2: we're also missing adding Idefics3ImageProcessorFast to docs/source/en/model_doc/idefics3.md, look slike something went wrong with the transformers-cli script

Comment on lines 292 to 296
def _assertEquivalence(self, a, b):
self.assertTrue(torch.allclose(a, b, atol=1e-1))
self.assertLessEqual(
torch.mean(torch.abs(a - b)).item(), 1e-3
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 🤗

# For an example of a fast image processor requiring more complex augmentations, see `LlavaNextImageProcessorFast`.

# Default values should be checked against the slow image processor
# None values left after checking can be removed
resample = PILImageResampling.BICUBIC
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep Lanczos here, and override the resize function to force Bicubic if resample is Lanczos, with a warning_once so users are aware of the issue. See my comment here: #37140 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the warning to resize method. If all good, I will forward the same changes for Flava PR

Copy link
Member

@yonigozlan yonigozlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rootonchair ! I'm currently working on a refactor of the processing in idefics2/idefics3/smolvlm family which will impact this PR, here's the PR in question: #37291

Basically it will change the processing of batched images to fully flattened images, so it should simplify this PR. It looks like you're already handling image processing this way though, so maybe you won't have much to change.

In any case, I advise waiting for this refactor to be merged before iterating on this PR, I'll ping here when it's done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants