Differential Binarization model #2095

mehtamansi29 · 2025-02-12T16:42:28Z

Differential Binarization model.

Notebook: https://colab.sandbox.google.com/gist/mehtamansi29/86289c0ef273772f5de309ce20d0292d/diffbin_training_2.ipynb

keras_hub/src/models/image_text_detector_preprocessor.py

sachinprasadhs

Took high level pass and left some comments.
Also,
Make al the file names in follow the same format like other files, for db_utils and losses.py

keras_hub/src/models/diffbin/db_utils.py

keras_hub/src/models/image_text_detector_preprocessor.py

…ted output

… shell_format

gemini-code-assist

Code Review

This pull request introduces the Differential Binarization (DiffBin) model, including the model's backbone, text detector task, loss function, preprocessor, and associated utilities and tests. Additionally, there's a widespread cleanup of import statements across the API, removing redundant aliases, which improves code readability.

The implementation of the new DiffBin model is a great start, but there are several areas that need attention to improve correctness, maintainability, and completeness.

gemini-code-assist · 2025-07-16T19:17:11Z

keras_hub/src/models/diffbin/diffbin_loss.py

+    def call(self, y_true, y_pred):
+        prob_map_true = y_true[..., 0:1]  # Channel 0
+        binary_map_true = y_true[..., 1:2]  # Channel 1
+        thresh_map_true = y_true[..., 2:3]  # Channel 2
+        dilated_mask = y_true[..., 3:4]  # Channel 3
+
+        prob_map_pred = y_pred[..., 0:1]  # Channel 0 - probability maps
+        thresh_map_pred = y_pred[..., 1:2]  # Channel 1 - threshold maps
+        binary_map_pred = y_pred[..., 2:3]
+
+        ls = self.hard_negative_mining_bce(prob_map_true, prob_map_pred)
+        lb = self.hard_negative_mining_bce(thresh_map_true, thresh_map_pred)
+        lt = self.threshold_map_loss(
+            binary_map_true, binary_map_pred, dilated_mask
+        )
+        total_loss = ls + (self.alpha * lb) + (self.beta * lt)
+        return total_loss


The implementation of the loss function is confusing and seems to deviate from the original "Differentiable Binarization" paper, which could lead to incorrect behavior or difficulty in maintenance.

Specifically:

Variable Naming: The variable names lb and lt seem to be swapped compared to their purpose. lb is calculated on threshold maps, while lt is calculated on binary maps. The paper uses L_b for binary map loss and L_t for threshold map loss.

Hyperparameter Roles: The roles of alpha and beta are also swapped compared to the paper's formula L = L_s + alpha * L_b + beta * L_t. In the code, alpha weights the threshold map loss (lb) and beta weights the binary map loss (lt).

Loss Functions: The paper suggests L1 loss for the threshold map and Dice loss for the binary map. This implementation uses BCE for the threshold map and L1 for the binary map.

While deviations from the paper can be valid, the current implementation is hard to follow due to the inconsistencies in naming. I recommend refactoring for clarity.

gemini-code-assist · 2025-07-16T19:17:11Z

keras_hub/src/models/diffbin/diffbin_textdetector.py

+    `map_output` now holds a 8x224x224x3 tensor, where the last dimension
+    corresponds to the model's probability map, threshold map and binary map
+    outputs. Use `postprocess_to_polygons()` to obtain a polygon
+    representation:
+    ```python
+    detector.postprocess_to_polygons(map_output[...,0])
+    ```


The example in the docstring demonstrates a call to detector.postprocess_to_polygons(map_output[...,0]). However, the postprocess_to_polygons method is not implemented in the DiffBinTextDetector class. This method should be implemented, or the docstring should be updated to reflect the available functionality.

gemini-code-assist · 2025-07-16T19:17:11Z

keras_hub/src/models/diffbin/diffbin_backbone.py

+        self,
+        image_encoder,
+        fpn_channels=256,
+        head_kernel_list=[3, 2, 2],
+        image_shape=(640, 640, 3),
+        dtype=None,


Using mutable default arguments like lists or tuples is a common pitfall in Python and can lead to unexpected behavior. It's a best practice to default to None and then assign the mutable object inside the function body.

Consider adding the following logic at the beginning of __init__:

if head_kernel_list is None: head_kernel_list = [3, 2, 2] if image_shape is None: image_shape = (640, 640, 3)

Suggested change

self,

image_encoder,

fpn_channels=256,

head_kernel_list=[3, 2, 2],

image_shape=(640, 640, 3),

dtype=None,

head_kernel_list=None,

image_shape=None,

you can keep image_shape=(None, None, 3) for consistency

gemini-code-assist · 2025-07-16T19:17:11Z

keras_hub/src/models/diffbin/diffbin_backbone.py

+    # lateral_p5 = layers.Conv2D(
+    #     out_channels,
+    #     kernel_size=1,
+    #     use_bias=False,
+    #     name="neck_lateral_p5",
+    #     dtype=dtype,
+    # )(inputs["P5"])


This block of commented-out code should be removed to improve code clarity and maintainability.

gemini-code-assist · 2025-07-16T19:17:11Z

keras_hub/src/models/diffbin/diffbin_backbone.py

+    )(topdown_p2)
+    featuremap_p4 = layers.UpSampling2D((4, 4), dtype=dtype)(featuremap_p4)
+    featuremap_p3 = layers.UpSampling2D((2, 2), dtype=dtype)(featuremap_p3)
+    featuremap_p2 = layers.UpSampling2D((1, 1), dtype=dtype)(featuremap_p2)


This upsampling layer with a size of (1, 1) is a no-op and can be removed to improve code clarity and avoid an unnecessary operation.

gemini-code-assist · 2025-07-16T19:17:12Z

keras_hub/src/models/diffbin/diffbin_loss_test.py

+        loss = self.loss_fn(y_true, y_pred)
+        self.assertGreaterEqual(loss.numpy(), 0.0)


The assertion self.assertGreaterEqual(loss.numpy(), 0.0) is very weak for this test case. When the predicted values are identical to the true values, the loss should be zero (or very close to it). A more precise assertion would make this test more meaningful and robust.

Suggested change

loss = self.loss_fn(y_true, y_pred)

self.assertGreaterEqual(loss.numpy(), 0.0)

self.assertAllClose(loss.numpy(), 0.0, atol=1e-6)

gemini-code-assist · 2025-07-16T19:17:12Z

keras_hub/src/models/image_text_detector_preprocessor.py

+        if y is None:
+            return self.image_converter(x)


There is a redundant call to self.image_converter(x). The variable x has already been updated with the result of self.image_converter(x) on line 54. This line can be simplified to return x.

return x

sachinprasadhs · 2025-07-16T20:48:25Z

Rebase the code to the latest master and resolve the gemini suggested comments

The inputs to `generate` are `"prompts"`, not `"text"`. Fixes keras-team#1685

* routine HF sync * code reformat

Bumps the python group with 2 updates: torch and torchvision. Updates `torch` from 2.6.0+cu126 to 2.7.0+cu126 Updates `torchvision` from 0.21.0+cu126 to 0.22.0+cu126 --- updated-dependencies: - dependency-name: torch dependency-version: 2.7.0+cu126 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: python - dependency-name: torchvision dependency-version: 0.22.0+cu126 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: python ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Modify TransformerEncoder masking documentation * Added space before parenthesis

…s are int32 (keras-team#2305)

* Fix Mistral conversion script This commit addresses several issues in the Mistral checkpoint conversion script: - Adds `dropout` to the model initialization to match the Hugging Face model. - Replaces `requests.get` with `hf_hub_download` for more reliable tokenizer downloads. - Adds support for both `tokenizer.model` and `tokenizer.json` to handle different Mistral versions. - Fixes a `TypeError` in the `save_to_preset` function call. * address format issues * adopted to latest hub style * address format issues --------- Co-authored-by: laxmareddyp <laxmareddyp@laxma-n2-highmem-256gbram.us-central1-f.c.gtech-rmi-dev.internal>

Updates the requirements on [tensorflow-cpu](https://github.com/tensorflow/tensorflow), [tensorflow](https://github.com/tensorflow/tensorflow), [tensorflow-text](https://github.com/tensorflow/text), torch, torchvision and [tensorflow[and-cuda]](https://github.com/tensorflow/tensorflow) to permit the latest version. Updates `tensorflow-cpu` to 2.19.0 - [Release notes](https://github.com/tensorflow/tensorflow/releases) - [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md) - [Commits](tensorflow/tensorflow@v2.18.1...v2.19.0) Updates `tensorflow` to 2.19.0 - [Release notes](https://github.com/tensorflow/tensorflow/releases) - [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md) - [Commits](tensorflow/tensorflow@v2.18.1...v2.19.0) Updates `tensorflow-text` to 2.19.0 - [Release notes](https://github.com/tensorflow/text/releases) - [Commits](tensorflow/text@v2.18.0...v2.19.0) Updates `torch` from 2.7.0+cu126 to 2.7.1+cu126 Updates `torchvision` from 0.22.0+cu126 to 0.22.1+cu126 Updates `tensorflow[and-cuda]` to 2.19.0 - [Release notes](https://github.com/tensorflow/tensorflow/releases) - [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md) - [Commits](tensorflow/tensorflow@v2.18.0...v2.19.0) --- updated-dependencies: - dependency-name: tensorflow-cpu dependency-version: 2.19.0 dependency-type: direct:production dependency-group: python - dependency-name: tensorflow dependency-version: 2.19.0 dependency-type: direct:production dependency-group: python - dependency-name: tensorflow-text dependency-version: 2.19.0 dependency-type: direct:production dependency-group: python - dependency-name: torch dependency-version: 2.7.1+cu126 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: python - dependency-name: torchvision dependency-version: 0.22.1+cu126 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: python - dependency-name: tensorflow[and-cuda] dependency-version: 2.19.0 dependency-type: direct:production dependency-group: python ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* init * update * bug fixes * add qwen causal lm test * fix qwen3 tests

* support flash-attn at torch backend * fix * fix * fix * fix conflit * fix conflit * fix conflit * fix conflit * fix conflit * fix conflit * format

* init: Add initial project structure and files * bug: Small bug related to weight loading in the conversion script * finalizing: Add TIMM preprocessing layer * incorporate reviews: Consolidate stage configurations and improve API consistency * bug: Unexpected argument error in JAX with Keras 3.5 * small addition for the D-FINE to come: No changes to the existing HGNetV2 * D-FINE JIT compile: Remove non-essential conditional statement * refactor: Address reviews and fix some nits

* Register qwen3 presets * fix format

This reverts commit 80b3c56.

ImageText detector preprocessor for Differential Binarization model

ed97271

sineeli reviewed Feb 12, 2025

View reviewed changes

keras_hub/src/models/image_text_detector_preprocessor.py Outdated Show resolved Hide resolved

mehtamansi29 added 3 commits March 11, 2025 17:43

db_utils functions and testfile

d97f362

Diffbin utils function and test file

de3aaae

diffbin utils function and testfile

9a3cf2a

sachinprasadhs added the WIP Pull requests which are work in progress and not ready yet for review. label Apr 11, 2025

sachinprasadhs mentioned this pull request May 7, 2025

Adding Differential Binarization model from PaddleOCR to Keras3 #1739

Closed

mehtamansi29 added 5 commits May 12, 2025 21:40

diffbin preprocessing function

93ad1ba

diffbin postprocessing function

7268535

diffbin postprocessing function_1

f1c3734

diffbin postprocessing function_2

d3c74c9

diffbin postprocessing function_3

aafef9e

sachinprasadhs reviewed May 14, 2025

View reviewed changes

sachinprasadhs reviewed May 15, 2025

View reviewed changes

keras_hub/src/models/image_text_detector_preprocessor.py Outdated Show resolved Hide resolved

sachinprasadhs reviewed May 15, 2025

View reviewed changes

keras_hub/src/models/image_text_detector_preprocessor.py Outdated Show resolved Hide resolved

mehtamansi29 added 15 commits May 20, 2025 17:33

Merge branch 'keras-team:master' into diffbin

352a089

diffbin preocessing and db_utils completed

d94a2e6

Merge branch 'keras-team:master' into diffbin

0028b90

diffbin_backbone model creation and backboone test for diffbin segmen…

d4724d9

…ted output

Merge branch 'keras-team:master' into diffbin

3c75f47

modifited diffbin _textdetector

d41dc34

Updates image_text_detector preprocessor

4b602c4

Updates image_text_detector preprocessor with ignores argument

ee2dced

Updates image_text_detector preprocessor,db_utils and formatting with…

736b0c9

… shell_format

Updates image_text_detector_1

fcfed6a

Updates image_text_detector_1

98e2fbc

Updates image_text_detector_3

5fcaefc

Updates image_text_detector_3

19c4e79

Updates image_text_detector_4

a5516dc

Updates image_text_detector_5

b46db73

gemini-code-assist bot reviewed Jul 16, 2025

View reviewed changes

sachinprasadhs added this to KerasHub Jul 16, 2025

sachinprasadhs moved this to In Progress in KerasHub Jul 16, 2025

mehtamansi29 marked this pull request as ready for review July 17, 2025 04:21

Gemini Suggested changes

80b3c56

mehtamansi29 changed the title ~~[WIP] Differential Binarization model~~ Differential Binarization model Jul 21, 2025

mehtamansi29 and others added 23 commits July 22, 2025 11:44

Resolved conflict

020d629

Fix PaliGemmaCausalLM example. (keras-team#2302)

bb76db9

The inputs to `generate` are `"prompts"`, not `"text"`. Fixes keras-team#1685

Routine HF sync (keras-team#2303)

e53aeb0

* routine HF sync * code reformat

incorrect condition on self.sliding_window_size (keras-team#2289)

025371f

Modify TransformerEncoder masking documentation (keras-team#2297)

2b21c6c

* Modify TransformerEncoder masking documentation * Added space before parenthesis

Fix Gemma3InterleaveEmbeddings JAX inference error by ensuring indice…

94b40e5

…s are int32 (keras-team#2305)

update preset versions (keras-team#2307)

b3d18cc

Qwen3 causal lm (keras-team#2311)

c3deb47

* init * update * bug fixes * add qwen causal lm test * fix qwen3 tests

Update JAX GPU version (keras-team#2319)

c91ca35

support flash-attn at torch backend (keras-team#2257)

c99e86d

* support flash-attn at torch backend * fix * fix * fix * fix conflit * fix conflit * fix conflit * fix conflit * fix conflit * fix conflit * format

Qwen3 presets register (keras-team#2325)

98df372

* Register qwen3 presets * fix format

diffbin_imagetextdetector and precommit changes

3eeba26

Update diffbin loss function and test file for loss function_1

445f537

Resolved conflict

afd6251

Revert "Gemini Suggested changes"

848abd0

This reverts commit 80b3c56.

Resolving Conflicts_1

5820ccc

resolving conflict___1

5546ac8

resolving conflict___2

f099d84

resolving conflict___3

9593170

		loss = self.loss_fn(y_true, y_pred)
		self.assertGreaterEqual(loss.numpy(), 0.0)

	loss = self.loss_fn(y_true, y_pred)
	self.assertGreaterEqual(loss.numpy(), 0.0)
	self.assertAllClose(loss.numpy(), 0.0, atol=1e-6)

Differential Binarization model #2095

Are you sure you want to change the base?

Differential Binarization model #2095

Uh oh!

Conversation

mehtamansi29 commented Feb 12, 2025 • edited by sachinprasadhs Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

sachinprasadhs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

sachinprasadhs Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

sachinprasadhs commented Jul 16, 2025

Uh oh!

Uh oh!

mehtamansi29 commented Feb 12, 2025 •

edited by sachinprasadhs

Loading