Skip to content

[WIP] PARSeq Model #2089

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 82 commits into
base: master
Choose a base branch
from
Open

[WIP] PARSeq Model #2089

wants to merge 82 commits into from

Conversation

sineeli
Copy link
Collaborator

@sineeli sineeli commented Feb 10, 2025

PARSeq Model

Description of the Change

This PR adds an end-to-end scene text recognition model, PARSeq, to KerasHub. PARSeq is a ViT-based OCR model that enables iterative decoding for robust text recognition in natural scenes.

Closes the first half of #<issue_number>

Reference

For details, see Scene Text Recognition with Permuted Autoregressive Sequence Models (PARSeq paper). The model and configuration are based on the official paper and open-source implementation

Colab Notebook

Usage and numerics matching Colab:

Checklist

  • I have added all the necessary unit tests for my change.
  • I have verified that my change does not break existing code and works with all backends (TensorFlow, JAX, and PyTorch).
  • My PR is based on the latest changes of the main branch (if unsure, rebase the code).
  • I have followed the Keras Hub Model contribution guidelines in making these changes.
  • I have followed the Keras Hub API design guidelines in making these changes.
  • I have signed the Contributor License Agreement.

@abheesht17
Copy link
Collaborator

@sineeli - which parts of the PR are ready for review? Asking because it's still marked as draft

@sineeli
Copy link
Collaborator Author

sineeli commented Feb 20, 2025

Sure @abheesht17

First preprocessing and tokenizer these parts I think are good for reviewing, as they are the primary steps.

  1. keras_hub/src/models/parseq/parseq_tokenizer.py
  2. keras_hub/src/models/text_recognition_preprocessor.py

Copy link
Collaborator

@abheesht17 abheesht17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Left some comments on the tokeniser. Will take a look at the text recognition preprocessor soon.

Sorry for the delay in reviewing

"keras_hub.models.PARSeqTokenizer",
]
)
class PARSeqTokenizer(tokenizer.Tokenizer):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a doc-string here, with examples. Makes it easier to review when we have examples :P

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add unit tests as well

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will add them

Comment on lines 64 to 81
self.char_to_id = tf.lookup.StaticHashTable(
initializer=tf.lookup.KeyValueTensorInitializer(
keys=list(self._stoi.keys()),
values=list(self._stoi.values()),
key_dtype=tf.string,
value_dtype=tf.int32,
),
default_value=0,
)
self.id_to_char = tf.lookup.StaticHashTable(
initializer=tf.lookup.KeyValueTensorInitializer(
keys=list(self._stoi.values()),
values=list(self._stoi.keys()),
key_dtype=tf.int32,
value_dtype=tf.string,
),
default_value=self.pad_token,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The defaults don't match. EOS is the 0th token, and pad is the len(vocabulary) - 1th token

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recognized the same in the original code, but seems they are using EOS -> 0, BOS->len(vocabulary), but while padding they are doing BOS first and then EOS at the end.

),
default_value=0,
)
self.id_to_char = tf.lookup.StaticHashTable(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this? We aren't using it anywhere

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But in case if user wants to bulk change the token ids to characters it will be helpful

label = tf.strings.upper(label)

label = tf.strings.regex_replace(label, self.unsupported_regex, "")
label = tf.strings.substr(label, 0, self.max_label_length)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we truncating the input to 25 characters?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While preparing the dataset in the preprocessing itself if the label is above 25 they jus ignore that datapoint itself. Instead I truncated and we can start and end tokens instead.

Ref: https://github.com/baudm/parseq/blob/1902db043c029a7e03a3818c616c06600af574be/strhub/data/dataset.py#L112

@sineeli sineeli marked this pull request as ready for review May 19, 2025 17:50
@sineeli sineeli requested review from abheesht17 and mattdangerw May 19, 2025 21:11
@sineeli
Copy link
Collaborator Author

sineeli commented May 30, 2025

@sachinprasadhs, @abheesht17, @mattdangerw

Can you take a look at the PR when you get some time, thank you!

@sineeli sineeli requested a review from sachinprasadhs May 30, 2025 21:10
Copy link
Collaborator

@sachinprasadhs sachinprasadhs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, added some comments,
could you please add a PR description by following the recent PR description template which includes Colab notebook link with end to end working demo and numerics verification.
Also add the original implementation reference in the PR description.

dropout_rate: float. The dropout rate. Defaults to `0.1`.
attention_dropout: float. The dropout rate for the attention weights.
Defaults to `0.1`.
dtype: str. The dtype used for layers.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow same arg description we follow for other models for dtype.

Defaults to `0.1`.
dtype: str. The dtype used for layers.
**kwargs: Additional keyword arguments passed to the base
`keras.Model` constructor.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an Examples section demonstrating sample usage of the backbone

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding in causal_lm file rather than here. Its more suitable there

type (e.g., "int32") or a string type ("string").
Defaults to `"int32"`.
**kwargs: Additional keyword arguments passed to the base
`keras.layers.Layer` constructor.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add Example section as well and unit test still pending I guess?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In preprocessor section we have the testing of both image converter and tokenizer

@sachinprasadhs sachinprasadhs added kokoro:force-run Runs Tests on GPU and removed WIP Pull requests which are work in progress and not ready yet for review. labels Jun 23, 2025
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Jun 23, 2025
@divyashreepathihalli
Copy link
Collaborator

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the PARSeq model, a ViT-based OCR model, to KerasHub. I've identified a few issues, including two critical bugs related to model serialization and tokenizer functionality that must be addressed. I've also found a couple of medium-severity issues regarding a typo in a layer name and a docstring example that should be corrected for clarity and maintainability.

Comment on lines +159 to +168
def get_config(self):
config = super().get_config()
config.update(
{
"sequence_length": self.sequence_length,
"add_start_token": self.add_start_token,
"add_end_token": self.add_end_token,
}
)
return config

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The image_converter layer is not being serialized in get_config(). This will cause an error when saving and loading a model that uses this preprocessor, as the image_converter will be missing upon deserialization.

You should serialize image_converter in get_config() and also add a from_config() classmethod to handle its deserialization.

Suggested change
def get_config(self):
config = super().get_config()
config.update(
{
"sequence_length": self.sequence_length,
"add_start_token": self.add_start_token,
"add_end_token": self.add_end_token,
}
)
return config
def get_config(self):
config = super().get_config()
config.update(
{
"image_converter": keras.layers.serialize(self.image_converter),
"sequence_length": self.sequence_length,
"add_start_token": self.add_start_token,
"add_end_token": self.add_end_token,
}
)
return config


def get_vocabulary(self):
"""Get the tokenizer vocabulary as a list of strings tokens."""
return self.vocabulary.keys()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

self.vocabulary is a string, so calling .keys() on it will raise an AttributeError. To get the vocabulary as a list of characters, you should use list(self.vocabulary) instead.

Suggested change
return self.vocabulary.keys()
return list(self.vocabulary)

Comment on lines +78 to +81
preprocessor = keras_hub.models.PARSeqPreprocessor(
image_converter=image_converter,
tokenizer=tokenizer,
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a typo in the class name within the docstring example. PARSeqPreprocessor should be PARSeqCausalLMPreprocessor to match the actual class name and ensure the example is runnable.

Suggested change
preprocessor = keras_hub.models.PARSeqPreprocessor(
image_converter=image_converter,
tokenizer=tokenizer,
)
preprocessor = keras_hub.models.PARSeqCausalLMPreprocessor(
image_converter=image_converter,
tokenizer=tokenizer,
)

num_heads=self.num_heads,
key_dim=self.key_dim,
dropout=self.attention_dropout,
name="corss_attention",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a typo in the layer name. corss_attention should be cross_attention for clarity and consistency.

Suggested change
name="corss_attention",
name="cross_attention",

@sachinprasadhs sachinprasadhs moved this to In Progress in KerasHub Jul 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

6 participants