[WIP] PARSeq Model #2089

sineeli · 2025-02-10T22:36:45Z

PARSeq Model

Description of the Change

This PR adds an end-to-end scene text recognition model, PARSeq, to KerasHub. PARSeq is a ViT-based OCR model that enables iterative decoding for robust text recognition in natural scenes.

Closes the first half of #<issue_number>

Reference

For details, see Scene Text Recognition with Permuted Autoregressive Sequence Models (PARSeq paper). The model and configuration are based on the official paper and open-source implementation

Colab Notebook

Usage and numerics matching Colab:

Checklist

I have added all the necessary unit tests for my change.
I have verified that my change does not break existing code and works with all backends (TensorFlow, JAX, and PyTorch).
My PR is based on the latest changes of the main branch (if unsure, rebase the code).
I have followed the Keras Hub Model contribution guidelines in making these changes.
I have followed the Keras Hub API design guidelines in making these changes.
I have signed the Contributor License Agreement.

keras_hub/src/models/parseq/parseq_tokenizer.py

abheesht17 · 2025-02-20T16:01:27Z

@sineeli - which parts of the PR are ready for review? Asking because it's still marked as draft

sineeli · 2025-02-20T18:52:00Z

Sure @abheesht17

First preprocessing and tokenizer these parts I think are good for reviewing, as they are the primary steps.

keras_hub/src/models/parseq/parseq_tokenizer.py
keras_hub/src/models/text_recognition_preprocessor.py

abheesht17

Thanks for the PR! Left some comments on the tokeniser. Will take a look at the text recognition preprocessor soon.

Sorry for the delay in reviewing

abheesht17 · 2025-02-25T01:41:14Z

keras_hub/src/models/parseq/parseq_tokenizer.py

+        "keras_hub.models.PARSeqTokenizer",
+    ]
+)
+class PARSeqTokenizer(tokenizer.Tokenizer):


Please add a doc-string here, with examples. Makes it easier to review when we have examples :P

Let's add unit tests as well

Yes, will add them

keras_hub/src/models/parseq/parseq_tokenizer.py

abheesht17 · 2025-02-25T02:24:03Z

keras_hub/src/models/parseq/parseq_tokenizer.py

+        self.char_to_id = tf.lookup.StaticHashTable(
+            initializer=tf.lookup.KeyValueTensorInitializer(
+                keys=list(self._stoi.keys()),
+                values=list(self._stoi.values()),
+                key_dtype=tf.string,
+                value_dtype=tf.int32,
+            ),
+            default_value=0,
+        )
+        self.id_to_char = tf.lookup.StaticHashTable(
+            initializer=tf.lookup.KeyValueTensorInitializer(
+                keys=list(self._stoi.values()),
+                values=list(self._stoi.keys()),
+                key_dtype=tf.int32,
+                value_dtype=tf.string,
+            ),
+            default_value=self.pad_token,
+        )


The defaults don't match. EOS is the 0th token, and pad is the len(vocabulary) - 1th token

I recognized the same in the original code, but seems they are using EOS -> 0, BOS->len(vocabulary), but while padding they are doing BOS first and then EOS at the end.

abheesht17 · 2025-02-25T02:24:23Z

keras_hub/src/models/parseq/parseq_tokenizer.py

+            ),
+            default_value=0,
+        )
+        self.id_to_char = tf.lookup.StaticHashTable(


Do we need this? We aren't using it anywhere

But in case if user wants to bulk change the token ids to characters it will be helpful

keras_hub/src/models/parseq/parseq_tokenizer.py

abheesht17 · 2025-02-25T02:29:14Z

keras_hub/src/models/parseq/parseq_tokenizer.py

+            label = tf.strings.upper(label)
+
+        label = tf.strings.regex_replace(label, self.unsupported_regex, "")
+        label = tf.strings.substr(label, 0, self.max_label_length)


Why are we truncating the input to 25 characters?

While preparing the dataset in the preprocessing itself if the label is above 25 they jus ignore that datapoint itself. Instead I truncated and we can start and end tokens instead.

Ref: https://github.com/baudm/parseq/blob/1902db043c029a7e03a3818c616c06600af574be/strhub/data/dataset.py#L112

keras_hub/src/models/parseq/parseq_tokenizer.py

sineeli · 2025-05-30T21:09:54Z

@sachinprasadhs, @abheesht17, @mattdangerw

Can you take a look at the PR when you get some time, thank you!

sachinprasadhs

Thanks, added some comments,
could you please add a PR description by following the recent PR description template which includes Colab notebook link with end to end working demo and numerics verification.
Also add the original implementation reference in the PR description.

keras_hub/src/models/parseq/parseq_backbone.py

sachinprasadhs · 2025-06-09T21:05:12Z

keras_hub/src/models/parseq/parseq_backbone.py

+        dropout_rate: float. The dropout rate. Defaults to `0.1`.
+        attention_dropout: float. The dropout rate for the attention weights.
+        Defaults to `0.1`.
+        dtype: str. The dtype used for layers.


Follow same arg description we follow for other models for dtype.

sachinprasadhs · 2025-06-09T22:24:15Z

keras_hub/src/models/parseq/parseq_backbone.py

+        Defaults to `0.1`.
+        dtype: str. The dtype used for layers.
+        **kwargs: Additional keyword arguments passed to the base
+            `keras.Model` constructor.


Add an Examples section demonstrating sample usage of the backbone

Adding in causal_lm file rather than here. Its more suitable there

keras_hub/src/models/parseq/parseq_decoder.py

sachinprasadhs · 2025-06-09T22:32:57Z

keras_hub/src/models/parseq/parseq_tokenizer.py

+            type (e.g., "int32") or a string type ("string").
+            Defaults to `"int32"`.
+        **kwargs: Additional keyword arguments passed to the base
+            `keras.layers.Layer` constructor.


Add Example section as well and unit test still pending I guess?

In preprocessor section we have the testing of both image converter and tokenizer

divyashreepathihalli · 2025-07-11T00:05:34Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces the PARSeq model, a ViT-based OCR model, to KerasHub. I've identified a few issues, including two critical bugs related to model serialization and tokenizer functionality that must be addressed. I've also found a couple of medium-severity issues regarding a typo in a layer name and a docstring example that should be corrected for clarity and maintainability.

gemini-code-assist · 2025-07-11T00:07:03Z

keras_hub/src/models/parseq/parseq_causal_lm_preprocessor.py

+    def get_config(self):
+        config = super().get_config()
+        config.update(
+            {
+                "sequence_length": self.sequence_length,
+                "add_start_token": self.add_start_token,
+                "add_end_token": self.add_end_token,
+            }
+        )
+        return config


The image_converter layer is not being serialized in get_config(). This will cause an error when saving and loading a model that uses this preprocessor, as the image_converter will be missing upon deserialization.

You should serialize image_converter in get_config() and also add a from_config() classmethod to handle its deserialization.

Suggested change

def get_config(self):

config = super().get_config()

config.update(

{

"sequence_length": self.sequence_length,

"add_start_token": self.add_start_token,

"add_end_token": self.add_end_token,

}

)

return config

def get_config(self):

config = super().get_config()

config.update(

{

"image_converter": keras.layers.serialize(self.image_converter),

"sequence_length": self.sequence_length,

"add_start_token": self.add_start_token,

"add_end_token": self.add_end_token,

}

)

return config

gemini-code-assist · 2025-07-11T00:07:04Z

keras_hub/src/models/parseq/parseq_tokenizer.py

+
+    def get_vocabulary(self):
+        """Get the tokenizer vocabulary as a list of strings tokens."""
+        return self.vocabulary.keys()


self.vocabulary is a string, so calling .keys() on it will raise an AttributeError. To get the vocabulary as a list of characters, you should use list(self.vocabulary) instead.

Suggested change

return self.vocabulary.keys()

return list(self.vocabulary)

gemini-code-assist · 2025-07-11T00:07:04Z

keras_hub/src/models/parseq/parseq_causal_lm.py

+    preprocessor = keras_hub.models.PARSeqPreprocessor(
+        image_converter=image_converter,
+        tokenizer=tokenizer,
+    )


There's a typo in the class name within the docstring example. PARSeqPreprocessor should be PARSeqCausalLMPreprocessor to match the actual class name and ensure the example is runnable.

Suggested change

preprocessor = keras_hub.models.PARSeqPreprocessor(

image_converter=image_converter,

tokenizer=tokenizer,

)

preprocessor = keras_hub.models.PARSeqCausalLMPreprocessor(

image_converter=image_converter,

tokenizer=tokenizer,

)

gemini-code-assist · 2025-07-11T00:07:04Z

keras_hub/src/models/parseq/parseq_decoder.py

+            num_heads=self.num_heads,
+            key_dim=self.key_dim,
+            dropout=self.attention_dropout,
+            name="corss_attention",


There is a typo in the layer name. corss_attention should be cross_attention for clarity and consistency.

Suggested change

name="corss_attention",

name="cross_attention",

sineeli added 13 commits January 31, 2025 11:11

Base for parseq model

528d3a4

make it vit compatiable with diff height and width sizes

3bf11cd

correct vit conv scripts

a8fb177

make class token optional in backbone by default its included

6f4363a

add flags to adjust vit network

d1cece0

add test case for without class_token

92b2745

Merge branch 'master' into parseq

ed00b73

decoder file

25f661c

parseq tokenizer base

f97fab1

add api for parseq tokenizer

d424210

Add missing arg max_label_length.

3f3ad0d

nit

bb4457e

Merge branch 'master' into parseq

68829f8

sineeli commented Feb 10, 2025

View reviewed changes

keras_hub/src/models/parseq/parseq_tokenizer.py Show resolved Hide resolved

sineeli added 5 commits February 11, 2025 15:28

add missing normalization step using tf_text

1bde466

add missing config for preprocessor

e6c5379

add default start, pad and end tokens

5b08c93

nit

49260ef

correct special token order

b4150ed

abheesht17 self-assigned this Feb 18, 2025

divyashreepathihalli requested a review from abheesht17 February 18, 2025 17:20

sineeli added 3 commits February 18, 2025 10:33

return padding mask as well

ed8b9d7

use proper keras ops

4e4511c

nit

9222331

abheesht17 requested changes Feb 25, 2025

View reviewed changes

sineeli added 3 commits March 3, 2025 11:42

add decoder for parseq

78a07a0

Build unbuilt layers for model validation

decc12c

fix forward pass and decoder

7aa2b67

sineeli added 11 commits May 15, 2025 09:17

fix input format and add causal lm testing

0e7cbbd

use numpy random images

a87ae57

fix jax backend issue when reduction set to "mean_with_sample_weight"

7c1fe2c

remove redudant classes and use causal lm base calsses itself.

58917dd

nit

3cf997c

fix decoder_head_dim usage

f3f3cef

fix preprocessing issues

eb5d4ef

Merge branch 'master' into parseq

f5e21ed

add checkpoint convertion script

b6b7a26

add missing flag

8c6f14c

validate convertion outputs

e89398b

sineeli marked this pull request as ready for review May 19, 2025 17:50

nit

764a204

sineeli requested review from abheesht17 and mattdangerw May 19, 2025 21:11

sineeli and others added 2 commits May 20, 2025 12:11

fix training for permutation logic

180774d

Merge branch 'master' into parseq

4201d0b

sineeli requested a review from sachinprasadhs May 30, 2025 21:10

sachinprasadhs reviewed Jun 9, 2025

View reviewed changes

sineeli added 3 commits June 18, 2025 22:56

add example usage for backbone and causal lm

751b0a8

nit

3860843

Merge remote-tracking branch 'upstream/master' into parseq

6f5f093

sachinprasadhs added kokoro:force-run Runs Tests on GPU and removed WIP Pull requests which are work in progress and not ready yet for review. labels Jun 23, 2025

kokoro-team removed the kokoro:force-run Runs Tests on GPU label Jun 23, 2025

gemini-code-assist bot reviewed Jul 11, 2025

View reviewed changes

sachinprasadhs added this to KerasHub Jul 16, 2025

sachinprasadhs moved this to In Progress in KerasHub Jul 16, 2025

[WIP] PARSeq Model #2089

Are you sure you want to change the base?

[WIP] PARSeq Model #2089

Conversation

sineeli commented Feb 10, 2025 • edited by sachinprasadhs Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PARSeq Model

Description of the Change

Reference

Colab Notebook

Checklist

Uh oh!

Uh oh!

abheesht17 commented Feb 20, 2025

Uh oh!

sineeli commented Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abheesht17 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sineeli commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sachinprasadhs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

divyashreepathihalli commented Jul 11, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

sineeli commented Feb 10, 2025 •

edited by sachinprasadhs

Loading

sineeli commented Feb 20, 2025 •

edited

Loading

abheesht17 left a comment •

edited

Loading

sineeli commented May 30, 2025 •

edited

Loading