You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CONTRIBUTING.md
+119-1Lines changed: 119 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -112,7 +112,125 @@ New models are constantly released and if you want to implement a new model, ple
112
112
113
113
If you are willing to contribute the model yourself, let us know so we can help you add it to 🤗 Transformers!
114
114
115
-
We have a technical guide for [how to add a model to 🤗 Transformers](https://huggingface.co/docs/transformers/add_new_model).
115
+
We have a technical guide for [how to add a model to 🤗 Transformers](https://huggingface.co/docs/transformers/modular_transformers).
116
+
117
+
### Vision-Language Model Contribution Checklist
118
+
119
+
If you're contributing a **vision-language model** (or any multimodal model that processes images/videos), please follow this checklist. Maintainers will use this to review your PR, and completing these steps will significantly increase the likelihood of your PR being merged quickly.
120
+
121
+
**Required checklist for all vision-language model contributions:**
122
+
123
+
☐ **1. Implement a modular file**
124
+
125
+
All new models should use the modular architecture pattern. Create a `modular_<model_name>.py` file using the modular model converter:
126
+
127
+
- Use the CLI, [`transformers add-new-model-like`](https://github.com/huggingface/transformers/blob/main/src/transformers/cli/add_new_model_like.py) to generate a modular skeleton and get started
128
+
- All code should be in the modular file if possible. Modeling must be in it, it's better if configuration is in it as well.
129
+
- Reuse existing patterns from similar models as much as possible
This will generate the separate files (`modeling_*.py`, `configuration_*.py`, etc.) from your modular file. The CI will enforce that these generated files match your modular file.
138
+
139
+
☐ **2. Add a fast image processor (for image models)**
140
+
141
+
If your model processes images, implement a fast image processor that uses `torch` and `torchvision` instead of PIL/numpy for better inference performance:
142
+
143
+
- See the detailed guide in [#36978](https://github.com/huggingface/transformers/issues/36978)
144
+
- Fast processors inherit from `BaseImageProcessorFast`
See `tests/models/llava_onevision/test_modeling_llava_onevision.py` for complete examples.
179
+
180
+
☐ **5. Update documentation**
181
+
182
+
Add or update model documentation:
183
+
184
+
- Create if the cli hasn't `docs/source/en/model_doc/<model_name>.md` with usage examples
185
+
- Include model description, paper link, and basic usage with `Pipeline` and `AutoModel`
186
+
- Add the model to the appropriate TOC files
187
+
188
+
☐ **6. Look for reusable patterns**
189
+
190
+
The library has 400+ models with many established patterns:
191
+
192
+
- Search for similar models (e.g., other vision-language models)
193
+
- Reuse attention mechanisms, layer implementations, and processing patterns
194
+
- Check models like LLaVA, Idefics2, Fuyu for vision-language patterns
195
+
- Use provided decorators like (`auto_docstring`, `can_return_tuple`, `check_model_inputs` and `_can_record_outputs`) where relevant.
196
+
- Don't reinvent the wheel
197
+
198
+
☐ **7. Run quality checks and read the output**
199
+
200
+
Before submitting your PR, install quality dependencies and run the full check suite:
201
+
202
+
```bash
203
+
pip install -e ".[quality]"
204
+
make fixup
205
+
```
206
+
207
+
**Important**: Take time to read the output of `make fixup`. It will:
208
+
- Lint and format your code automatically
209
+
- Run consistency checks (imports, docstrings, etc.)
210
+
- Show any remaining issues that need manual fixes
211
+
212
+
All checks must pass before your PR can be merged.
213
+
214
+
**If this checklist is complete, your PR has a very high likelihood of being merged!** Following these steps makes the maintainers' work much easier and will reduce the number of review iterations, getting your important work out there faster.
215
+
216
+
#### Copy-pastable checklist for maintainers
217
+
218
+
Here's a condensed version maintainers can copy into PRs:
219
+
220
+
```markdown
221
+
## Multimodal Model Addition Checklist
222
+
223
+
Please ensure your PR completes all following items. See the [full checklist](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#vision-language-model-contribution-checklist) for details.
224
+
225
+
- [ ] **Modular file**: `modular_<model_name>.py` implemented and verified with `python utils/modular_model_converter.py <model_name>`
226
+
- [ ] **Fast image processor**: Implemented using `BaseImageProcessorFast` (see [#36978](https://github.com/huggingface/transformers/issues/36978))
227
+
- [ ] **Conversion script**: `convert_<model_name>_to_hf.py` added with usage examples
228
+
- [ ] **Integration tests**: End-to-end tests with exact output matching (text or logits)
229
+
- [ ] **Documentation**: Model docs added/updated in `docs/source/en/model_doc/`
230
+
- [ ] **Pattern reuse**: Verified against similar models (LLaVA, Idefics2, etc.)
231
+
- [ ] **Quality checks**: `make fixup` passes with no errors
0 commit comments