Release OpenCompass v0.1.5 · open-compass/opencompass

Dive into our newly improved features, bug fixes, and most notably our enhanced dataset support, coming together to refine your experience.

🆕 Highlights:

Boosted Dataset Integrations: This release paves the way for support on numerous datasets like ds1000, promptbench, antropics evals, kaoshi, and many more, making OpenCompass more versatile than ever.
More Evaluation Types: We starts integrating subjective and agent-adied LLM evaluation into OpenCompass. Stay tuned!

Explore the detailed changes:

🌟 New Features:

📦 New Datasets and Features:
- ds1000 dataset support (#395)
- promptbench dataset implementation (#239)
- antropics evals dataset support (#422)
- kaoshi dataset introduction (#392)
- Initial support for subjective evaluation (#421)
- Support for GSM8k evaluation tools (#277)
- scibench evaluation added (#393)

📖 Documentation:

News updates and introduction figure in README (#375, #413)
Updated get_started.md and fixed naming issues (#377, #380)
New FAQ section added (#384)
README addition in longeval (#389)
Multimodal documentation introduced (#334)

🛠️ Bug Fixes:

Addressed a potential OOM issue (#387)
Added has_image fix to scienceqa (#391)
Resolved performance issues of visualglm (#424)
Debug logger fix for summarizer (#417)
Addressed errors in keep keys (#431)

⚙ Enhancements and Refactors:

Refinement in docs and codes for better user guidance (#409)
Custom summarizer argument added in CLI mode (#411)
mlugowl llamaadapter introduced (#405)
Enhanced mm models support on public datasets (#412)
Customized config path support (#423)

🎉 New Contributors:

A heartfelt welcome to our first-time contributors:

@wangxidong06 (First PR)
@so2liu (First PR)
@HoBeedzc (First PR)
@CuteyThyme (First PR)
@chenbohua3 (First PR)

To all contributors, old and new, thank you for continually enhancing OpenCompass! Your efforts are deeply valued. 🙌 🎉

If you love OpenCompass, don't forget to star 🌟 our GitHub repository! Your feedback, reviews, and contributions immensely help in shaping the product.

Changelog

[Doc] Update News by @tonysy in #375
Update get_started.md by @liushz in #377
[CI] Publish to Pypi by @gaotongxiao in #366
[Docs] Fix incorrect name in get_started by @gaotongxiao in #380
fix potential OOM issue by @cdpath in #387
[Docs] Add FAQ by @gaotongxiao in #384
Add CMB by @wangxidong06 in #376
[Fix]: Add has_image to scienceqa by @YuanLiuuuuuu in #391
[Feat] support ds1000 dataset by @yingfhu in #395
[Feat] implementation for support promptbench by @yingfhu in #239
[Feat] refine docs and codes for more user guides by @yingfhu in #409
[Docs] Readme in longeval by @philipwangOvO in #389
feat: add custom summarizer argument in CLI run mode 在CLI启动模式中添加自定义Summarizer参数 by @so2liu in #411
Yhzhang/add mlugowl llamaadapter by @ZhangYuanhan-AI in #405
[Feat] Support mm models on public dataset and fix several issues. by @yyk-wew in #412
[Docs] Add intro figure to README by @gaotongxiao in #413
[fix] summarizer debug logger by @HoBeedzc in #417
[Doc] Update news by @Leymore in #420
[Feature] Use local accuracy from hf implements by @Leymore in #416
[Feat] support antropics evals dataset by @yingfhu in #422
[Fix] Fix performance issue of visualglm. by @yyk-wew in #424
[Feature] Log gold answer in prediction output by @gaotongxiao in #419
Support GSM8k evaluation with tools by Lagent and LangChain by @mzr1996 in #277
[Sync] Initial support of subjective evaluation by @gaotongxiao in #421
[Fix] P0: errors in keep keys by @gaotongxiao in #431
add evaluation of scibench by @CuteyThyme in #393
[Feature] Add kaoshi dataset by @liushz in #392
[Docs] Add multimodal docs by @fangyixiao18 in #334
support customize config path by @chenbohua3 in #423

Full Changelog: 0.1.4...0.1.5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenCompass v0.1.5