Skip to content

Add D-FINE to KerasHub #2318

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

harshaljanjani
Copy link
Collaborator

@harshaljanjani harshaljanjani commented Jul 9, 2025

Description of the change

Welcome D-FINE to the KerasHub family of models!
D-FINE, a powerful real-time object detector, sets a new state-of-the-art benchmark for object detection on KerasHub. It achieves outstanding localization precision by redefining the bounding box regression task in DETR models. Additionally, it incorporates lightweight optimizations in computationally intensive modules and operations, striking a better balance between speed and accuracy. Specifically, D-FINE-L/X achieves 54.0%/55.8% AP on the COCO dataset at 124/78 FPS on an NVIDIA T4 GPU. When pretrained on Objects365, D-FINE-L/X attains 57.1%/59.3% AP, surpassing all existing real-time detectors.

Closes the second half and thus, the complete issue #2271

Results in Action of KerasHub's D-FINE

Model Predictions Numerics Matching (Visit the Colab notebook for the complete results)

Colab Notebook

D-FINE: Complete Workflow with Predictions and Numerics Matching

Checklist

  • I have added all the necessary unit tests for my change.
  • I have verified that my change does not break existing code and works with all backends (TensorFlow, JAX, and PyTorch).
  • My PR is based on the latest changes of the main branch (if unsure, rebase the code).
  • I have followed the Keras Hub Model contribution guidelines in making these changes.
  • I have followed the Keras Hub API design guidelines in making these changes.
  • I have signed the Contributor License Agreement.

@harshaljanjani harshaljanjani self-assigned this Jul 9, 2025
@harshaljanjani
Copy link
Collaborator Author

The tests fail due to pending reviews on the HGNetV2 dependency. Once it is merged, the D-FINE PR will be open for review, and the tests will pass, as demonstrated in the notebook.

@divyashreepathihalli
Copy link
Collaborator

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the D-FINE model to KerasHub, including its architecture, layers, tests, and a checkpoint conversion script. The implementation is comprehensive and well-structured. I've provided a few suggestions to improve code clarity, maintainability, and correctness. Overall, this is a solid contribution.

@harshaljanjani harshaljanjani marked this pull request as ready for review July 12, 2025 19:25
@harshaljanjani
Copy link
Collaborator Author

harshaljanjani commented Jul 13, 2025

@divyashreepathihalli @mattdangerw D-FINE is ready for its first round of reviews!
As discussed, we're leaving out the task model from the scope of this PR, given the sheer volume of the code, since the task model is a 1000+ LOC effort in itself. I've not only covered the numerics check, but also the examples we'd add to the quickstart notebook on Kaggle once merged, in the Colab notebook linked in the PR description!

Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Nice work. Just some initial comments.

In general, now that this is up and working let's see if we can find anywhere to cut complexity if we can. Anything we can do to same lines of code (without playing code golf) will probably help keep this maintainable for the future.


def __init__(
self,
decoder_in_channels,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty massive list of arguments.

First, let's try to whittle down to the stuff we absolutely need. For some of the _scale/_temperature/etc might be fine to leave these as default args on relevant component layers and not expose them if they won't vary in any of our presets (we can always add them later if someone asks!).

Second, we could consider taking in sub-models directly. We did this for SD3, which was facing a similar arg explosion. Maybe not for everything, but could consider this for the encoder/decoder and HGNetV2Backbone. Take a look at how SD3 does this https://github.com/keras-team/keras-hub/blob/master/keras_hub/src/models/stable_diffusion_3/stable_diffusion_3_backbone.py#L641-L654. Anything that we take in as an argument must be exposed as a public symbol.

But overall, let's try to simplify this arg list as much as we can. We want a practical list of things users might actually change, not an exhaustive list of everything that could possibly change.

Copy link
Collaborator Author

@harshaljanjani harshaljanjani Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed!

I've kept in mind the practical list from the user's perspective approach you mentioned above. I'll take you through my thought process here a bit just so you can evaluate my approach, since I'd love to know your thoughts.

As you mentioned, HGNetV2Backbone can be passed as an argument, which I've implemented based on the SD3 pattern you shared, including the serialization and deserialization paradigm. Having said that, I think the following makes it unnecessary to do the same for the encoder and the decoder:

a. I checked for invariants across the 13 KH presets for D-FINE, and I noticed many low-level arguments that wouldn't need to be varied by the user practically, such as a {}_prob, {}_scale, and {}_temperature args, which I've removed.

b. I've, however, retained a select few arguments, even though I noticed they were invariants across this set of presets, for example, lqe_hidden_dim, that may want to be configured by the user. For example, the denoising box_noise_scale and label_noise_ratio, these are args that are seemingly invariant, but depending on the users' training scenario, they might want to be configured, so I've kept them in.

This way, we don't play code golf, and they don’t have to instantiate the encoder and decoder separately, which helps maintain symmetry. Otherwise, it might feel inconsistent (e.g., why only instantiate the encoder manually?). At the same time, this helps reduce the number of arguments, we’re able to trim down ~30 just by doing this.
I also rechecked for numerics consistency, inference, preset loading, etc., and everything is maintained, so we're good.

I'd love to know your thoughts on the same, thanks!

@@ -0,0 +1,147 @@
# Metadata for loading pretrained model weights.
backbone_presets = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably easiest to leave this empty and only add once we have the actual kaggle handles

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve done it, but may I ask why we can’t have the presets with placeholders for the Kaggle handles just yet? Thanks!

@harshaljanjani
Copy link
Collaborator Author

Thanks for the reviews @mattdangerw. Yeah let's definitely cut down the complexity wherever possible for maintainability, I'll look into it!

@harshaljanjani
Copy link
Collaborator Author

@mattdangerw Could you please check if all your comments have been addressed when you have the time, thanks a lot!

@sachinprasadhs sachinprasadhs moved this to In Progress in KerasHub Jul 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

3 participants