v0.21.0
Summary
-
New Models.
- Xception: Added Xception architecture for image classification tasks.
- Qwen: Added Qwen2.5 large language models and presets of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters.
- Qwen MoE: Added transformer-based Mixture of Experts (MoE) decoder-only language model with a base variant having 2.7B activated parameters during runtime.
- Mixtral: Added Mixtral LLM, a pretrained generative Sparse Mixture of Experts with pre-trained and instruction tuned models having 7 billion activated parameters.
- Moonshine: Added Moonshine, a speech recognition task model.
- CSPNet: Added Cross Stage Partial Network (CSPNet) classification task model.
- Llama3: Added support for Llama 3.1 and 3.2.
-
Added sharded weight support to KerasPresetSaver and KerasPresetLoader, defaulting to a 10GB maximum shard size.
What's Changed
- Fix Roformer export symbol by @abheesht17 in #2199
- Bump up master version to 0.21 by @abheesht17 in #2204
- reenable test by @mattdangerw in #2188
- Add xception model by @mattdangerw in #2179
- Make image converter built by @mattdangerw in #2206
- Qwen - Fix Preset Loader + Add Causal LM Test by @kanpuriyanawab in #2193
- Update Qwen conversion script by @laxmareddyp in #2207
- Revert "Do not export Qwen for release" by @sachinprasadhs in #2208
- Fixes compute_output_shape for PaliGemmaVitEncoder and Gemma3VisionEncoderBlock by @JyotinderSingh in #2210
- Python 3.12 fix by @mattdangerw in #2211
- Small Gemma3 doc-string edits by @abheesht17 in #2214
- Llama3.1 by @pctablet505 in #2132
- Update gemma3_causal_lm_preprocessor.py by @pctablet505 in #2217
- fix: apply
weights_only = True
by @b8zhong in #2215 - Fix the keras_hub package for typecheckers and IDEs by @mattdangerw in #2222
- Add utility to map COCO IDs to class names by @mattdangerw in #2219
- Set GPU timeouts to 2 hours by @mattdangerw in #2226
- Fix nightly by @mattdangerw in #2227
- Another fix for nightly builds by @mattdangerw in #2229
- Cast a few more input to tensors in SD3 by @mattdangerw in #2234
- Fix up package build scripts again by @mattdangerw in #2230
- Add qwen presets by @laxmareddyp in #2241
- script for converting retinanet weights from trochvision by @sineeli in #2233
- Sharded weights support by @james77777778 in #2218
- Add Qwen Moe by @kanpuriyanawab in #2163
- Add Mixtral by @kanpuriyanawab in #2196
- Made label data optional for inference and adopted other required changes by @laxmareddyp in #2183
- Fix the layer names by @kanpuriyanawab in #2247
- Add new CSPNet preset and add manual padding. by @sachinprasadhs in #2212
- Update the int8 quant logic in
ReversibleEmbedding
by @james77777778 in #2250 - Add Moonshine to KerasHub by @harshaljanjani in #2093
- Add Kaggle handle for moonshine presets by @laxmareddyp in #2253
- Update requirements-jax-cuda.txt by @pctablet505 in #2252
- Add Mixtral,Qwen-MoE presets and Update conversion script. by @laxmareddyp in #2248
- fix flash attention test by @divyashreepathihalli in #2263
- Fix JAX bugs for qwen moe & mixtral by @kanpuriyanawab in #2258
- Create pull_request_template.md by @sachinprasadhs in #2262
- Update preset versions for sharded models by @laxmareddyp in #2264
- Add AudioToText and AudioToTextPreprocessor class stubs to enable auto class functionality by @harshaljanjani in #2265
- register moonshine presets by @sachinprasadhs in #2267
- Version bump 0.21.0.dev1 by @laxmareddyp in #2273
- Version bump to 0.21.0 by @laxmareddyp in #2275
New Contributors
- @JyotinderSingh made their first contribution in #2210
- @pctablet505 made their first contribution in #2132
- @b8zhong made their first contribution in #2215
Full Changelog: v0.20.0...v0.21.0