Release OpenCompass v0.1.3 · open-compass/opencompass

OpenCompass keeps getting better! v0.1.3 brings a variety of enhancements, new features, and crucial fixes. Here’s a summary of what we've packed into this release:

🆕 Highlights:

Extended Dataset Support: OpenCompass now integrates a broader range of public datasets, including but not limited to adv_glue, codegeex2, Humanevalx, SEED-Bench, LongBench, and LEval. We aim to provide extensive coverage to cater to a variety of research needs.
Utility Additions: From the inclusion of multi-modal evaluations on MME benchmark to the Tree-of-Thought method, this release comes packed with functionality enhancements.
Bug Extermination: Your feedback helps us grow. We’ve squashed a series of bugs to improve your experience.
More Evaluation Benchmark for Multimodal Models. We support another 10 evaluation benchmarks for multimodal models, including COCO Caption and ScienceQA, and provide corresponding evaluation code.

Let's delve deeper into what's new:

🌟 New Features:

📦 Extended Dataset Support:

Introduction of other public datasets (#206, #214).
Support for adv_glue dataset focused on adversarial robustness (#205).
Added codegeex2, Humanevalx (#210).
Integration of SEED-Bench (#203).
LongBench support (#236).
Reconstruct LEval dataset (#266).
Support another 10 public evaluation benchmarks for multimodal models (#214)

🛠 Utilities and Functionality:

Launch script added for ease of operations (#222).
Multi-modal evaluation on MME benchmark (#197).
Support for visualglm and llava on MMBench evaluation (#211).
Tree-of-Thought method introduced (#173).
Introduction of llama2 native implementations (#235).
Flamingo and Claude support added (#258, #253).

📝 Documentation:

Navigation bar language type updated for better clarity (#212).
News updates for keeping users informed (#241, #243).
Summarizer documentation added (#231).

🛠️ Bug Fixes:

Addressed an issue with multiple rounds of inference using mm_eval (#201).
Miscellaneous fixes such as name adjustments, requirements, and bin_trim corrections (#223, #229, #237).
Local runner debug issue fixed (#238).
Resolved bugs for PeftModel generate (#252).

⚙ Enhancements and Refactors:

Refactored instructblip for better performance and readability (#227).
Improved crowspairs postprocess (#251).
Optimization to use sympy only when necessary (#255).

🎉 New Contributors:

Thank you to all our contributors for this release, with a special shoutout to our new contributors:

@yyk-wew (First PR)
@fangyixiao18 (First PR)
@philipwangOvO (First PR)
@cdpath (First PR)

Thank you to our dedicated contributors for making OpenCompass even more comprehensive and user-friendly! 🙌 🎉

Remember to star 🌟 our GitHub repository if you find OpenCompass helpful! Your feedback and contributions are invaluable.

Change log

[Fix] Fix bugs of multiple rounds of inference when using mm_eval by @yyk-wew in #201
[Feature]: Add other public datasets by @YuanLiuuuuuu in #206
[Doc] Update Navigation bar language type by @Ezra-Yu in #212
[Feat] support adv_glue dataset for adversarial robustness by @yingfhu in #205
[Feat] Add codegeex2 and Humanevalx by @Ezra-Yu in #210
[Feature]: Add other public datasets config by @YuanLiuuuuuu in #214
[Feature] Support SEED-Bench by @fangyixiao18 in #203
[Feature]: Add launch script by @YuanLiuuuuuu in #222
[Fix]: Fix name by @YuanLiuuuuuu in #223
[Fix] requirements by @gaotongxiao in #229
[Dataset] LongBench by @philipwangOvO in #236
[Fix] bin_trim by @philipwangOvO in #237
[Feat] Support multi-modal evaluation on MME benchmark. by @yyk-wew in #197
[Feat] Support visualglm and llava for MMBench evaluation. by @yyk-wew in #211
[Fix] fix local runner debug by @Leymore in #238
Update News by @tonysy in #241
[Doc]update news by @tonysy in #243
Update run.py by @liushz in #247
[Doc] Add summarizer doc by @Leymore in #231
[Feature] Add llama2 native implements by @Leymore in #235
[Feature] Add Tree-of-Thought method by @liushz in #173
[Refactor] Refactor instructblip by @fangyixiao18 in #227
[Enhancement] Update crowspairs postprocess by @gaotongxiao in #251
[Fix] use sympy only when necessary by @gaotongxiao in #255
Update .owners.yml by @tonysy in #261
[Fix] Fix bugs for PeftModel generate by @LZHgrla in #252
[Feature]: Add Flamingo by @YuanLiuuuuuu in #258
[Feature] Add Claude support by @gaotongxiao in #253
[Dataset] Reconstruct LEval by @philipwangOvO in #266
[Feature]: Verify the acc of these public datasets by @YuanLiuuuuuu in #269
- [Feat] Support public dataset of visualglm and llava. by @yyk-wew in #265
[Fix] wrong path in dataset collections by @gaotongxiao in #272
[Fix] update descriptions of tools by @cdpath in #270
[Feature] Support model-bound prediction postprocessor, use it in Claude by @gaotongxiao in #268
[Feature] Simplify entry script by @gaotongxiao in #204
Update README.md by @tonysy in #262

For a complete list of changes, please refer to our Full Changelog.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenCompass v0.1.3