-
Notifications
You must be signed in to change notification settings - Fork 289
[WIP] PARSeq Model #2089
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[WIP] PARSeq Model #2089
Conversation
@sineeli - which parts of the PR are ready for review? Asking because it's still marked as draft |
Sure @abheesht17 First preprocessing and tokenizer these parts I think are good for reviewing, as they are the primary steps.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Left some comments on the tokeniser. Will take a look at the text recognition preprocessor soon.
Sorry for the delay in reviewing
"keras_hub.models.PARSeqTokenizer", | ||
] | ||
) | ||
class PARSeqTokenizer(tokenizer.Tokenizer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a doc-string here, with examples. Makes it easier to review when we have examples :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add unit tests as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will add them
self.char_to_id = tf.lookup.StaticHashTable( | ||
initializer=tf.lookup.KeyValueTensorInitializer( | ||
keys=list(self._stoi.keys()), | ||
values=list(self._stoi.values()), | ||
key_dtype=tf.string, | ||
value_dtype=tf.int32, | ||
), | ||
default_value=0, | ||
) | ||
self.id_to_char = tf.lookup.StaticHashTable( | ||
initializer=tf.lookup.KeyValueTensorInitializer( | ||
keys=list(self._stoi.values()), | ||
values=list(self._stoi.keys()), | ||
key_dtype=tf.int32, | ||
value_dtype=tf.string, | ||
), | ||
default_value=self.pad_token, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The defaults don't match. EOS is the 0th token, and pad is the len(vocabulary) - 1
th token
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recognized the same in the original code, but seems they are using EOS -> 0, BOS->len(vocabulary), but while padding they are doing BOS first and then EOS at the end.
), | ||
default_value=0, | ||
) | ||
self.id_to_char = tf.lookup.StaticHashTable( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this? We aren't using it anywhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But in case if user wants to bulk change the token ids to characters it will be helpful
label = tf.strings.upper(label) | ||
|
||
label = tf.strings.regex_replace(label, self.unsupported_regex, "") | ||
label = tf.strings.substr(label, 0, self.max_label_length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we truncating the input to 25 characters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While preparing the dataset in the preprocessing itself if the label is above 25 they jus ignore that datapoint itself. Instead I truncated and we can start and end tokens instead.
@sachinprasadhs, @abheesht17, @mattdangerw Can you take a look at the PR when you get some time, thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, added some comments,
could you please add a PR description by following the recent PR description template which includes Colab notebook link with end to end working demo and numerics verification.
Also add the original implementation reference in the PR description.
dropout_rate: float. The dropout rate. Defaults to `0.1`. | ||
attention_dropout: float. The dropout rate for the attention weights. | ||
Defaults to `0.1`. | ||
dtype: str. The dtype used for layers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow same arg description we follow for other models for dtype.
Defaults to `0.1`. | ||
dtype: str. The dtype used for layers. | ||
**kwargs: Additional keyword arguments passed to the base | ||
`keras.Model` constructor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add an Examples section demonstrating sample usage of the backbone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding in causal_lm file rather than here. Its more suitable there
type (e.g., "int32") or a string type ("string"). | ||
Defaults to `"int32"`. | ||
**kwargs: Additional keyword arguments passed to the base | ||
`keras.layers.Layer` constructor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add Example section as well and unit test still pending I guess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In preprocessor section we have the testing of both image converter and tokenizer
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces the PARSeq model, a ViT-based OCR model, to KerasHub. I've identified a few issues, including two critical bugs related to model serialization and tokenizer functionality that must be addressed. I've also found a couple of medium-severity issues regarding a typo in a layer name and a docstring example that should be corrected for clarity and maintainability.
def get_config(self): | ||
config = super().get_config() | ||
config.update( | ||
{ | ||
"sequence_length": self.sequence_length, | ||
"add_start_token": self.add_start_token, | ||
"add_end_token": self.add_end_token, | ||
} | ||
) | ||
return config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The image_converter
layer is not being serialized in get_config()
. This will cause an error when saving and loading a model that uses this preprocessor, as the image_converter
will be missing upon deserialization.
You should serialize image_converter
in get_config()
and also add a from_config()
classmethod to handle its deserialization.
def get_config(self): | |
config = super().get_config() | |
config.update( | |
{ | |
"sequence_length": self.sequence_length, | |
"add_start_token": self.add_start_token, | |
"add_end_token": self.add_end_token, | |
} | |
) | |
return config | |
def get_config(self): | |
config = super().get_config() | |
config.update( | |
{ | |
"image_converter": keras.layers.serialize(self.image_converter), | |
"sequence_length": self.sequence_length, | |
"add_start_token": self.add_start_token, | |
"add_end_token": self.add_end_token, | |
} | |
) | |
return config |
|
||
def get_vocabulary(self): | ||
"""Get the tokenizer vocabulary as a list of strings tokens.""" | ||
return self.vocabulary.keys() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
preprocessor = keras_hub.models.PARSeqPreprocessor( | ||
image_converter=image_converter, | ||
tokenizer=tokenizer, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a typo in the class name within the docstring example. PARSeqPreprocessor
should be PARSeqCausalLMPreprocessor
to match the actual class name and ensure the example is runnable.
preprocessor = keras_hub.models.PARSeqPreprocessor( | |
image_converter=image_converter, | |
tokenizer=tokenizer, | |
) | |
preprocessor = keras_hub.models.PARSeqCausalLMPreprocessor( | |
image_converter=image_converter, | |
tokenizer=tokenizer, | |
) |
num_heads=self.num_heads, | ||
key_dim=self.key_dim, | ||
dropout=self.attention_dropout, | ||
name="corss_attention", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PARSeq Model
Description of the Change
This PR adds an end-to-end scene text recognition model, PARSeq, to KerasHub. PARSeq is a ViT-based OCR model that enables iterative decoding for robust text recognition in natural scenes.
Closes the first half of #<issue_number>
Reference
For details, see Scene Text Recognition with Permuted Autoregressive Sequence Models (PARSeq paper). The model and configuration are based on the official paper and open-source implementation
Colab Notebook
Usage and numerics matching Colab:


Checklist