OpenCompass v0.2.0
🌟 Highlights
- 🛠 Data Contamination Analysis: A novel feature for analyzing and ensuring the integrity of dataset inputs.
- 🧠 Enhanced Subjective Evaluation: Implementation of a new subjective judgement system, providing more nuanced and accurate evaluations.
- 🚀 Chat Style Inferencer Support: Introduction of a new chat style inferencer, enhancing interactive capabilities.
- 🌐 Multilingual Features: Expansion to support Chinese versions of commonsenseqa, crowspairs, and nq datasets.
- 📊 New Datasets Integration: Addition of wikibench, rolebench, and updated versions of gsm8k and MathBench datasets for broader research applications.
- 🛠 Enhancements and Bug Fixes: Numerous improvements including a new subjective judgement system and updates in MathBench CodeInterpreter.
- 📝 Documentation and API Updates: Comprehensive updates to README and API interfaces for better user guidance and experience.
🚀 New Features & Enhancements
- Support for chat style inferencer, offering a more dynamic interaction model (#643).
- Addition of Chinese versions for key datasets: commonsenseqa, crowspairs, and nq (#144).
- Introduction of the wikibench dataset, providing a new benchmark for knowledge-based tasks (#655).
- Updated gsm8k and MathBench configurations for enhanced performance and accuracy (#652, #657).
- Addition of rolebench dataset, expanding the range of evaluative scenarios (#633).
- Implementation of new subjective judgement criteria for improved assessment accuracy (#660).
- Integration of advanced models like qwen-1.8b/72b and deepseek-7b/67b in the platform's configuration (#672).
- Launch of Data Contamination Analysis as a new feature, enhancing data integrity checks (#639).
🛠 Improvements & Fixes
- Removal of colossalai dependency to streamline operations (#645).
- Resolution of various bugs including hellaswag_ppl_47bff9 and standard deviation summarizer issues (#648, #675).
- Update and fix of the MathBench CodeInterpreter and related bugs (#657).
- Enhancement of API interface for improved functionality and user experience (#681).
📚 Documentation Updates
- Updated README for clearer guidance and information (#682).
- Documentation and docstring updates for accuracy and comprehensiveness (#684).
🎊 New Contributors
- A warm welcome to new contributors @rolellm, @liyucheng09, and @xmshi-trio. Your contributions have significantly enriched OpenCompass!
🔗 Full Changelog
- [Fix] remove colossalai dependency by @yingfhu in #645
- [Fix] Fix hellaswag_ppl_47bff9 by @Leymore in #648
- [Feature] Support chat style inferencer. by @mzr1996 in #643
- [Feature] Add Chinese version: commonsenseqa, crowspairs and nq by @liushz in #144
- [Feature] Add wikibench dataset by @liushz in #655
- [Feat] update gsm8k and math agent config by @yingfhu in #652
- [Feature] Update MathBench CodeInterpreter & fix MathBench Bug by @liushz in #657
- added rolebench dataset. by @rolellm in #633
- New subjective judgement by @bittersweet1999 in #660
- [Feature] Add qwen-1.8b/72b and deepseek-7b/67b configs by @Leymore in #672
- Add Data Contamination Analysis [New Feature] by @liyucheng09 in #639
- [Fix] fix bug on standart_deviation summarizer by @jingmingzhuo in #675
- update medbench by @xmshi-trio in #678
- [Enhancement] Update API Interface by @tonysy in #681
- [Doc] Update README by @kennymckormick in #682
- [Feat] support pr merge test ci by @yingfhu in #669
- [Feature] enhance the ability of humaneval_postprocess by @jingmingzhuo in #676
- [Sync] Update codes by @yingfhu in #683
- [Docs] fix docstring by @yingfhu in #684
- new version of subject by @bittersweet1999 in #680
- fixed small problem of new version subject evaluation by @bittersweet1999 in #686
- [Sync] bump version to 0.2.0 by @yingfhu in #690
- Explore the detailed changes and contributions in the full changelog: OpenCompass Changelog.
Thank you to all contributors for your hard work and dedication. OpenCompass v0.2.0 marks another step forward in our journey, bringing enhanced features and capabilities to the community. Let's continue to innovate and expand the horizons of OpenCompass! 🎉🌐💡