-
Notifications
You must be signed in to change notification settings - Fork 289
Added LayoutLMv3 #2178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Added LayoutLMv3 #2178
Conversation
@carrycooldude That you for the PR - the code structure does not match KerasHub style. |
@@ -0,0 +1,152 @@ | |||
"""Tests for LayoutLMv3 backbone.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove these docstring at the start of the file.
Adding General code structuring comments.
Refer any existing model implementations here https://github.com/keras-team/keras-hub/tree/master/keras_hub/src/models The test cases also should follow the template we are following in the models. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added few comments, most of it are general practice which we follow. Incorporate those general suggested changes across all the files.
And remove the files and directory which are not required like env directory.
@@ -0,0 +1 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this directory and file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still needs to be removed
keras_hub/src/models/__init__.py
Outdated
@@ -0,0 +1,4 @@ | |||
"""LayoutLMv3 document classifier.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file needs to be empty, all the import is handled in keras_hub/api directory and will be automatically generated whenever you run git commit -m "<message>"
Make sure you run pre-commit install
for the first time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pending
@@ -0,0 +1,15 @@ | |||
from keras_hub.src.models.layoutlmv3.layoutlmv3_backbone import LayoutLMv3Backbone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is mainly to register presets, follow other models to understand the format we follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pending
|
||
def __init__( | ||
self, | ||
vocab_size: int = 30522, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove type annotation from everywhere, we don't follow type annotation in Keras Hub
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still type annotation needs to be removed
References: | ||
- [LayoutLMv3 Paper](https://arxiv.org/abs/2204.08387) | ||
- [LayoutLMv3 GitHub](https://github.com/microsoft/unilm/tree/master/layoutlmv3) | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This entire doctring needs to be inside the Backbone class
""" | ||
|
||
import os | ||
from typing import Dict, List, Optional, Tuple, Union |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this once type annotation is removed
|
||
from .layoutlmv3_tokenizer import LayoutLMv3Tokenizer | ||
from .layoutlmv3_presets import backbone_presets | ||
from .layoutlmv3_transformer import LayoutLMv3TransformerLayer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change from relative imports to absolute imports everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change it from relative imports to absolute imports, we don't follow from . import abc
maintaining spatial relationships in documents. | ||
|
||
Args: | ||
vocab_size: int, defaults to 30522. Size of the vocabulary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Format for Args we follow is:
vocab_size: int. Size of the vocabulary. Defaults to 30522
This format should be followed for all and make sure it conveys the proper and complete required information.
``` | ||
""" | ||
|
||
presets = backbone_presets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need of this here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can keep the example, but we don't need presets = backbone_presets
self.use_rel_pos = use_rel_pos | ||
self.rel_pos_bins = rel_pos_bins | ||
self.max_rel_pos = max_rel_pos | ||
self.spatial_embedding_dim = spatial_embedding_dim |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should come at last.
You can follow below order:
# === Layers ===
# === Functional Model ===
# === Config ===
@sachinprasadhs any updates on this one? |
Still the review comments are not addressed, could you please fix those before I can suggest any more changes |
I guess I fixed it , can you tell me which are those? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pointed the comments where previous reviews were not addressed.
Also, remove layoutmv3_env
directory
``` | ||
""" | ||
|
||
presets = backbone_presets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can keep the example, but we don't need presets = backbone_presets
|
||
def __init__( | ||
self, | ||
vocab_size: int = 30522, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still type annotation needs to be removed
keras_hub/src/models/__init__.py
Outdated
@@ -0,0 +1,4 @@ | |||
"""LayoutLMv3 document classifier.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pending
@@ -0,0 +1,15 @@ | |||
from keras_hub.src.models.layoutlmv3.layoutlmv3_backbone import LayoutLMv3Backbone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pending
# Copyright 2024 The Keras Hub Authors. All Rights Reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# ============================================================================== | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this
"""LayoutLMv3 tokenizer implementation. | ||
|
||
This tokenizer inherits from WordPieceTokenizer and adds LayoutLMv3-specific | ||
functionality for document understanding tasks. | ||
|
||
Example: | ||
```python | ||
# Initialize the tokenizer | ||
tokenizer = LayoutLMv3Tokenizer.from_preset("layoutlmv3_base") | ||
|
||
# Tokenize text | ||
tokens = tokenizer("Hello world!") | ||
``` | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this, move the example inside LayoutLMv3Tokenizer if necessary.
"""Tests for LayoutLMv3 tokenizer.""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this
from ..layoutlmv3.layoutlmv3_tokenizer import LayoutLMv3Tokenizer | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No relative imports
"""LayoutLMv3 transformer layer implementation. | ||
|
||
This module implements the transformer layer used in the LayoutLMv3 model. | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this
from typing import Dict, Optional | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need of this
This PR is stale because it has been open for 14 days with no activity. It will be closed if no further activity occurs. Thank you. |
Hi, let us know once this PR is ready for review again. Thanks |
@sachinprasadhs can you check this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sachinprasadhs Just made some changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still see lot of previous comments which were not addressed, couldn't go thorough all the files since all the comments needs to be addressed to save everyone's time.
I would request you to go through each of the comments made in this PR so far, address each of them and mark it as resolved once you have made the necessary changes.
If you have trouble understanding any of the comment, let me know, I would be happy to clarify it for you.
Also, add the necessary test cases, I see many empty files.
And maintain the consistency across all the files.
from keras_hub.src.models.layoutlmv3.layoutlmv3_tokenizer import ( | ||
LayoutLMv3Tokenizer, | ||
) | ||
from keras_hub.src.utils.preset_utils import register_presets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LayoutLMv3Tokenizer import is not required here
__all__ = [ | ||
"LayoutLMv3Backbone", | ||
"LayoutLMv3Tokenizer", | ||
"LayoutLMv3TransformerLayer", | ||
] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove these lines
|
||
from .layoutlmv3_tokenizer import LayoutLMv3Tokenizer | ||
from .layoutlmv3_presets import backbone_presets | ||
from .layoutlmv3_transformer import LayoutLMv3TransformerLayer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change it from relative imports to absolute imports, we don't follow from . import abc
pad_token_id: int = 0, | ||
position_embedding_type: str = "absolute", | ||
use_cache: bool = True, | ||
classifier_dropout: Optional[float] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove optional type annotation, just keep the default or None value.
""" | ||
LayoutLMv3 backbone model implementation. | ||
|
||
This module implements the LayoutLMv3 model architecture as described in | ||
"LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking" | ||
(https://arxiv.org/abs/2204.08387). | ||
|
||
The LayoutLMv3 model is a multimodal transformer that combines text, layout, | ||
and visual information for document understanding tasks. It uses a unified | ||
architecture to process both text and image inputs, with special attention to | ||
spatial relationships in documents. | ||
|
||
Example: | ||
```python | ||
# Initialize backbone from preset | ||
backbone = LayoutLMv3Backbone.from_preset("layoutlmv3_base") | ||
|
||
# Process document image and text | ||
outputs = backbone({ | ||
"input_ids": input_ids, # Shape: (batch_size, seq_length) | ||
"bbox": bbox, # Shape: (batch_size, seq_length, 4) | ||
"attention_mask": attention_mask, # Shape: (batch_size, seq_length) | ||
"image": image # Shape: (batch_size, height, width, channels) | ||
}) | ||
``` | ||
|
||
References: | ||
- [LayoutLMv3 Paper](https://arxiv.org/abs/2204.08387) | ||
- [LayoutLMv3 GitHub](https://github.com/microsoft/unilm/tree/master/layoutlmv3) | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this docstring inside backbone class along with args, remove rest of the docstring, it is redundant.
presets = backbone_presets | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this line
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This PR introduces the LayoutLMv3 model, including its backbone, tokenizer, and a checkpoint conversion script. The implementation requires further work to ensure correctness and functionality. Key areas needing attention include the model's call
method, the transformer layer implementation, and the checkpoint conversion script.
|
||
# Process through transformer layers | ||
hidden_states = [embeddings] | ||
for layer in self.transformer_layers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@register_keras_serializable() | ||
class LayoutLMv3TransformerLayer(layers.Layer): | ||
def __init__( | ||
self, | ||
hidden_size, | ||
num_attention_heads, | ||
intermediate_size, | ||
hidden_act="gelu", | ||
hidden_dropout_prob=0.1, | ||
attention_probs_dropout_prob=0.1, | ||
initializer_range=0.02, | ||
layer_norm_eps=1e-12, | ||
qkv_bias=True, | ||
use_rel_pos=True, | ||
rel_pos_bins=32, | ||
max_rel_pos=128, | ||
name=None, | ||
**kwargs, | ||
): | ||
super().__init__(name=name, **kwargs) | ||
self.hidden_size = hidden_size | ||
self.num_attention_heads = num_attention_heads | ||
self.intermediate_size = intermediate_size | ||
self.hidden_act = hidden_act | ||
self.hidden_dropout_prob = hidden_dropout_prob | ||
self.attention_probs_dropout_prob = attention_probs_dropout_prob | ||
self.initializer_range = initializer_range | ||
self.layer_norm_eps = layer_norm_eps | ||
self.qkv_bias = qkv_bias | ||
self.use_rel_pos = use_rel_pos | ||
self.rel_pos_bins = rel_pos_bins | ||
self.max_rel_pos = max_rel_pos | ||
|
||
def call(self, hidden_states, attention_mask=None, **kwargs): | ||
# Minimal stub: just return hidden_states unchanged | ||
return hidden_states |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Create projection matrices based on actual weight shapes | ||
x_proj = np.random.normal( | ||
0, 0.02, (spatial_embedding_dim, hf_config.hidden_size) | ||
) | ||
y_proj = np.random.normal( | ||
0, 0.02, (spatial_embedding_dim, hf_config.hidden_size) | ||
) | ||
h_proj = np.random.normal( | ||
0, 0.02, (spatial_embedding_dim, hf_config.hidden_size) | ||
) | ||
w_proj = np.random.normal( | ||
0, 0.02, (spatial_embedding_dim, hf_config.hidden_size) | ||
) | ||
|
||
# Set weights for projection layers | ||
keras_model.x_proj.set_weights([x_proj, np.zeros(hf_config.hidden_size)]) | ||
keras_model.y_proj.set_weights([y_proj, np.zeros(hf_config.hidden_size)]) | ||
keras_model.h_proj.set_weights([h_proj, np.zeros(hf_config.hidden_size)]) | ||
keras_model.w_proj.set_weights([w_proj, np.zeros(hf_config.hidden_size)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def call(self, text, bbox=None, **kwargs): | ||
"""Tokenize text and process bounding boxes. | ||
|
||
Args: | ||
text: A string or list of strings to tokenize. | ||
bbox: Optional list of bounding box coordinates for each token. If | ||
provided, should be a list of lists of [x0, y0, x1, y1] | ||
coordinates. | ||
**kwargs: Additional keyword arguments passed to the parent class. | ||
|
||
Returns: | ||
A dictionary containing: | ||
- token_ids: Tensor of shape (batch_size, sequence_length) | ||
containing token IDs | ||
- padding_mask: Tensor of shape (batch_size, sequence_length) | ||
containing padding mask | ||
- attention_mask: Tensor of shape (batch_size, sequence_length) | ||
containing attention mask | ||
- bbox: Tensor of shape (batch_size, sequence_length, 4) | ||
containing bounding box coordinates (if provided) | ||
""" | ||
# Tokenize input text | ||
token_ids, padding_mask = super().call(text) | ||
|
||
# Add [CLS] token at the beginning | ||
batch_size = backend.shape(token_ids)[0] | ||
cls_token_ids = ( | ||
backend.ones((batch_size, 1), dtype="int32") * self.cls_token_id | ||
) | ||
cls_token_mask = ( | ||
backend.ones((batch_size, 1), dtype="int32") * self.cls_token_mask | ||
) | ||
|
||
token_ids = backend.concatenate([cls_token_ids, token_ids], axis=1) | ||
padding_mask = backend.concatenate( | ||
[cls_token_mask, padding_mask], axis=1 | ||
) | ||
|
||
# Add [SEP] token at the end | ||
sep_token_ids = ( | ||
backend.ones((batch_size, 1), dtype="int32") * self.sep_token_id | ||
) | ||
sep_token_mask = ( | ||
backend.ones((batch_size, 1), dtype="int32") * self.sep_token_mask | ||
) | ||
|
||
token_ids = backend.concatenate([token_ids, sep_token_ids], axis=1) | ||
padding_mask = backend.concatenate( | ||
[padding_mask, sep_token_mask], axis=1 | ||
) | ||
|
||
# Create attention mask | ||
attention_mask = backend.cast(padding_mask, dtype="int32") | ||
|
||
# Process bounding boxes | ||
if bbox is not None: | ||
bbox_tensor = backend.stack(bbox, axis=1) | ||
else: | ||
bbox_tensor = None | ||
|
||
return { | ||
"token_ids": token_ids, | ||
"padding_mask": padding_mask, | ||
"attention_mask": attention_mask, | ||
"bbox": bbox_tensor, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description
This PR fixes the LayoutLMv3 checkpoint conversion script to properly handle different spatial embedding dimensions between the base and large models. The base model uses 128 dimensions for all spatial embeddings, while the large model uses 171 dimensions for x/y coordinates and 170 dimensions for height/width.
Changes Made
Technical Details
The conversion script now:
Testing
Output Example