Skip to content

PaliGemma #1636

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 35 commits into from
May 21, 2024
Merged

PaliGemma #1636

merged 35 commits into from
May 21, 2024

Conversation

mattdangerw
Copy link
Member

Initial implementation of PaliGemma for KerasNLP.

divyashreepathihalli and others added 30 commits May 21, 2024 16:43
Co-authored-by: divyashreepathihalli <[email protected]>
* update image size arg thourght paligemma

* update tests
During generate, and in Gemma itself, we scale all text embeddings
by the sqrt of the hidden dim.

We should update our PaliGemmaBackbone to do the same.
* Add cli arguments to speed up conversion

* Formatting

* Update checkpoint conversion to image_classifier
* Add preset

* move location

* address review comments
…ength

* Update pali_gemma_causal_lm_preprocesor.py

* Update pali_gemma_causal_lm_preprocesor.py

* Update pali_gemma_causal_lm_preprocesor.py

* code reformat
* replicated gemma tokenizer

* Different fix

* Similar fix for causal lm

* code reformat

---------

Co-authored-by: Varun Singh <[email protected]>
I am not sure if we want to force the users to pass the image
embeddings themselves, it might be more frienly to allow the raw
image input.

Anyway, since this is unused and untested, and we should check with
the LIT team anyway on how to set this up, let's just remove it for
now.
* Added docstrings for paligemma decoder and backbone

* Add causal lm docstring

* add vit docstring

* add dtype arg

* update docstring

* Line wrapping and small nits

* added vit docstrings

* cleaned up with Matt's comments

---------

Co-authored-by: divyashreepathihalli <[email protected]>
Spell out "zero" not "0" for api consistency.

Remove include rescaling, the code looked very broken (wrong scale,
output unused).

Fix formatting.
mattdangerw and others added 5 commits May 21, 2024 16:44
* Add a response_mask input

* Improved attention logic to account for response mask (#87)

* Improved attention logic to account for response mask

* Addressed several comments

* remove vit_num_classes arg from pali_gemma_backbone

* fix backbone test

* Try simplifying the masking code

* Updated tests for thoroughness

* Comments and one fix

* update preset version

* deleted test using unused code path

* Added cast to solve bool issues

* update presets path

* code reformat#	modified:   keras_nlp/src/models/pali_gemma/pali_gemma_decoder_block_test.py

* remove changes to backbone args

---------

Co-authored-by: divyashreepathihalli <[email protected]>
Co-authored-by: Matt Watson <[email protected]>

---------

Co-authored-by: Varun Singh <[email protected]>
Co-authored-by: divyashreepathihalli <[email protected]>
Otherwise the symbol would have mismatched code examples
* More consistent defaults for PaliGemma

In general, we do not copy the hyper parameters for a specific
pre-trained model into the init args. Do the same here for
consistency.

Also, use as small test models as possible, so our unit testing
stays somewhat reasonable.

* add basica nd saved model test

---------

Co-authored-by: divyashreepathihalli <[email protected]>
@github-actions github-actions bot added the Gemma Gemma model specific issues label May 21, 2024
@mattdangerw mattdangerw added the kokoro:force-run Runs Tests on GPU label May 21, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label May 21, 2024
@mattdangerw mattdangerw merged commit b2ec380 into master May 21, 2024
19 checks passed
@mattdangerw mattdangerw deleted the paligemma branch August 22, 2024 00:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Gemma Gemma model specific issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants