-
Notifications
You must be signed in to change notification settings - Fork 289
Add Esm #2244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add Esm #2244
Conversation
ruff.....................................................................Passed
ruff-format..............................................................Passed
Error: Process completed with exit code 1. Please help me figure out how to solve this problem. |
Probably an issue with generating the API symbols. Looks like you need to sync with the latest changes on master, then you could try running |
You can rebase it to latest master code |
keras_hub/src/layers/modeling/reversible_embedding_test.py::ReversibleEmbeddingTest::test_quantize_dtype_argument_tie_weights - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/layers/modeling/reversible_embedding_test.py::ReversibleEmbeddingTest::test_quantize_dtype_argument_untie_weights - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/layers/modeling/reversible_embedding_test.py::ReversibleEmbeddingTest::test_quantize_int8_tie_weights - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/layers/modeling/reversible_embedding_test.py::ReversibleEmbeddingTest::test_quantize_int8_untie_weights - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/albert/albert_backbone_test.py::AlbertBackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/bart/bart_backbone_test.py::BartBackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/bert/bert_backbone_test.py::BertBackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/bloom/bloom_backbone_test.py::BloomBackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/clip/clip_backbone_test.py::CLIPBackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/deberta_v3/deberta_v3_backbone_test.py::DebertaV3BackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/distil_bert/distil_bert_backbone_test.py::DistilBertBackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/electra/electra_backbone_test.py::ElectraBackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/f_net/f_net_backbone_test.py::FNetBackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/falcon/falcon_backbone_test.py::FalconBackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/gemma/gemma_backbone_test.py::GemmaBackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/gemma/gemma_backbone_test.py::Gemma2BackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/gpt2/gpt2_backbone_test.py::GPT2BackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/gpt_neo_x/gpt_neo_x_backbone_test.py::GPTNeoXBackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/llama/llama_backbone_test.py::LlamaTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/mistral/mistral_backbone_test.py::MistralBackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/opt/opt_backbone_test.py::OPTBackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/pali_gemma/pali_gemma_backbone_test.py::PaliGemmaBackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/pali_gemma/pali_gemma_backbone_test.py::PaliGemma2BackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/phi3/phi3_backbone_test.py::Phi3Test::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/phi3/phi3_backbone_test.py::Phi3Test::test_backbone_basics_with_su_rotary - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/roberta/roberta_backbone_test.py::RobertaBackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/siglip/siglip_backbone_test.py::SigLIPBackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/siglip/siglip_backbone_test.py::SigLIP2BackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/t5/t5_backbone_test.py::T5BackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/whisper/whisper_backbone_test.py::WhisperBackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/xlm_roberta/xlm_roberta_backbone_test.py @mattdangerw @sachinprasadhs |
It's not related to your code, looks like some issue with the JAX backend, we will look into it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks fro the PR, I have added my comments, also add checkpoints conversion under: keras-hub/tools/checkpoint_conversion
intermediate_dim: int. The output dimension of the first Dense layer in | ||
a two-layer feedforward network for each transformer. | ||
dropout: float. Dropout probability for the Transformer encoder. | ||
layer_norm_eps:bool.Should we use ln after embedding? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't get the point here, are you asking our input or it's the arg detail, if it is the arg details, it needs to be repharsed, avoid question marks and the argument name is emb_layer_norm_before
layer_norm_eps
discription needs to be updated.
@sachinprasadhs @mattdangerw |
@mattdangerw @sachinprasadhs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added few more comments and few of the previous review comments still needs to be addressed
Disclaimer: Pre-trained models are provided on an "as is" basis, without | ||
warranties or conditions of any kind. | ||
Args: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still activation and max_wavelength description is missing!
Disclaimer: Pre-trained models are provided on an "as is" basis, without | ||
warranties or conditions of any kind. | ||
Args: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add arg description for pad_token_id as well
position_embedding_type:esm1 use abs position embeding,esm2 use rope. | ||
so this parameter is only except for absolute and rotary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still needs to be changed to:
position_embedding_type: str. The position embedding type to use. One of "absolute" and
"rotary". Use "absolute" for ESM1. Use "rotary" for ESM2. Defaults to "rotary".
|
||
|
||
@keras_hub_export("keras_hub.models.ESMProteinClassifierPreprocessor") | ||
class ESMProteinClassifierPreprocessor(BertTextClassifierPreprocessor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pending change here which should be subclassed from TextClassifierPreprocessor
instead of BertTextClassifierPreprocessor
max_sequence_length=1024, | ||
max_wavelength=10000, | ||
layer_norm_eps=1e-12, | ||
emb_layer_norm_before=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pending change, instead emb_layer_norm_before
--> use_pre_layer_norm
|
||
|
||
@keras_hub_export("keras_hub.models.ESMProteinClassifier") | ||
class ESMProteinClassifier(RobertaTextClassifier): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pending change.
You can subclass TextClassifier
and make the same changes as RobertaTextClassifier
instead of subclassing from another model.
Once you address all the comments, add end to end working colab along with the checkpoints conversion under: keras-hub/tools/checkpoint_conversion |
How to add a Colab notebook? Can you give me give a demo? |
Adding from one of the recent PR which got merged, you can do something like this
|
Hello, I've already added the Colab demo of tools/checkpoint_conversion/convert_esm_checkpoints.py in the PR description. I think this is enough, and we can refer to BERT for the rest. |
We don't have access to view the notebook, can you make it public. Thanks |
OK,It has been enable sharing |
Hi, The intention of the notebook is to verify the correctness of the model including, backbone, tasks with the usage details and the expected outcome and to verify the numerics stablity after weights transfer to the Keras architecture, with wither forward pass. |
Okay, I've added another notebook, which is a demo for predicting the suitable pH of enzymes using ESM. |
You can remove the The notebook which you have provided doesn't have predict method, Also in your conversion script, you have mentioned
I have provided the reference notebooks, please refer those. You can keep only ESM changes in this PR, you can create a new PR for roformer which also needs checkpoint conversion script, so that we can maintain the latest weight in Kaggle by generating the new weights with the script with any future changes to |
OK, I have modified the notebook, please check. In addition, roformerV2 does not need to convert scripts, it is a native keras model. I just modified the keras2 api |
@sachinprasadhs plz check my notebook |
Hi, Still your notebook does not demonstrate the actual use case example demonstrations like https://huggingface.co/docs/transformers/en/model_doc/esm#transformers.EsmForSequenceClassification.forward.example or https://huggingface.co/docs/transformers/en/model_doc/esm#transformers.EsmForProteinFolding.forward.example or |
We've included a training demo for ESM. As for ESMFold, that's another brand new pr. So can you just click and tell me what demo to add? Sorry for the trouble. |
Any demo with the implementation you have which predicts the actual data or the sample input data and display the output in the existing colab, and remove the folder/directory named esm2_t6_8M in your code, rest all it looks good. |
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for ESM models, including the backbone, classifier, and masked protein language modeling tasks, along with their corresponding preprocessors, tokenizers, and tests. I've identified several areas for improvement, including fixing a critical bug in an exception raise, correcting several documentation examples and descriptions that could mislead users, and addressing inconsistencies in model configuration and weight conversion. Addressing the feedback will improve the quality and robustness of the new ESM model support.
if self.use_rotary: | ||
qw, kw = self.rotary_embedding_layer(qw, kw) | ||
if version.parse(keras.__version__) < version.parse("3.6"): | ||
raise ("Please make sure your Keras version is >=3.6.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"token_ids": np.ones(shape=(2, 12), dtype="int32"), | ||
"segment_ids": np.array([[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0]] * 2), | ||
"padding_mask": np.array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0]] * 2), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example for preprocessed integer data includes "segment_ids"
and "padding_mask"
in the input features. However, the ESMBackbone
and ESMProteinClassifierPreprocessor
only expect "token_ids"
. This example is misleading and will not work as written. Please update it to only include "token_ids"
.
"token_ids": np.ones(shape=(2, 12), dtype="int32"), | |
"segment_ids": np.array([[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0]] * 2), | |
"padding_mask": np.array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0]] * 2), | |
} | |
features = { | |
"token_ids": np.ones(shape=(2, 12), dtype="int32"), | |
} |
hf_weight_key=f"{hf_prefix}{i}.LayerNorm.weight", | ||
) | ||
loader.port_weight( | ||
keras_variable=block.feedforward_norm.beta, | ||
hf_weight_key=f"{hf_prefix}{i}.LayerNorm.bias", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Hugging Face weight key for the feedforward layer normalization seems to be incorrect. Based on the transformers
implementation of EsmLayer
, the LayerNorm
is part of the output
block. The key should likely be encoder.layer.{i}.output.LayerNorm.weight
instead of encoder.layer.{i}.LayerNorm.weight
.
hf_weight_key=f"{hf_prefix}{i}.LayerNorm.weight", | |
) | |
loader.port_weight( | |
keras_variable=block.feedforward_norm.beta, | |
hf_weight_key=f"{hf_prefix}{i}.LayerNorm.bias", | |
hf_weight_key=f"{hf_prefix}{i}.output.LayerNorm.weight", | |
) | |
loader.port_weight( | |
keras_variable=block.feedforward_norm.beta, | |
hf_weight_key=f"{hf_prefix}{i}.output.LayerNorm.bias", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an illusion, and it is recommended to ignore it
num_heads=4, | ||
hidden_dim=256, | ||
intermediate_dim=512, | ||
head_size = 64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keras_variable=backbone.get_layer("embeddings_layer_norm").gamma, | ||
hf_weight_key="embeddings.layer_norm.weight", | ||
) | ||
loader.port_weight( | ||
keras_variable=backbone.get_layer("embeddings_layer_norm").beta, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The LayerNormalization
layer for embeddings in ESMBackbone
is named "emb_layer_norm"
, but the conversion script is trying to access it with the name "embeddings_layer_norm"
. This will cause a ValueError
when trying to get the layer.
keras_variable=backbone.get_layer("embeddings_layer_norm").gamma, | |
hf_weight_key="embeddings.layer_norm.weight", | |
) | |
loader.port_weight( | |
keras_variable=backbone.get_layer("embeddings_layer_norm").beta, | |
keras_variable=backbone.get_layer("emb_layer_norm").gamma, | |
hf_weight_key="embeddings.layer_norm.weight", | |
) | |
loader.port_weight( | |
keras_variable=backbone.get_layer("emb_layer_norm").beta, |
self.dropout = dropout | ||
self.max_wavelength = max_wavelength | ||
self.head_size = head_size | ||
self.dropout = dropout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
```python | ||
# Unbatched input. | ||
tokenizer = keras_hub.models.ESMTokenizer.from_preset( | ||
"roformer_v2_base_zh", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise ValueError( | ||
'`position_embedding_type` must be either `"rotary"`, or ' | ||
'`"absolute"`. Received ' | ||
"position_embedding_type={position_embedding_type}." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The string in the ValueError
is intended to be an f-string to include the value of position_embedding_type
, but it's missing the f
prefix. This will result in the literal string {position_embedding_type}
being part of the error message.
"position_embedding_type={position_embedding_type}." | |
f"position_embedding_type={position_embedding_type}." |
"""A ESM2 and ESM encoder network. | ||
|
||
This class implements a bi-directional Transformer-based encoder as | ||
described in ["Roformer"](https://github.com/facebookresearch/esm). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring mentions "Roformer" but links to the ESM repository. To avoid confusion, the link text should be updated to "ESM" to match the model being implemented.
described in ["Roformer"](https://github.com/facebookresearch/esm). | |
described in ["ESM"](https://github.com/facebookresearch/esm). |
from keras_hub.src.models.esm.esm_masked_plm import ( | ||
ESMMaskedPLM as ESM2MaskedPLM, | ||
) | ||
from keras_hub.src.models.esm.esm_masked_plm import ESMMaskedPLM as ESMMaskedPLM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not sure what you mean by “delete the esm2_t6_8M directory.” Looking at the demo notebook, all it does is install the environment, change the OS, and then run: python tools/checkpoint_conversion/convert_deit_checkpoints.py --preset deit-base-distilled-patch16-384 In my notebook I did exactly the same thing: installed the environment, changed the OS, and then ran python tools/checkpoint_conversion/convert_esm_checkpoints.py --preset esm2_t6_8M Could you give a more precise and detailed description of which notebook has the problem and what it is missing compared to the reference notebook? Further, in another notebook I explicitly provide demonstrations of ![]() ![]() A clear description would be greatly appreciated—thank you for your help! |
Thanks, I fixed some error with reference to gemini's review. |
from #2177
Achieved a smaller error with hf.
ESM Checkpoint Conversion and Numerics Verification Demo (across multiple backends): Notebook Link
Train Demo: Notebook Link