Skip to content

OpenCompass v0.2.0

Compare
Choose a tag to compare
@yingfhu yingfhu released this 12 Dec 06:42
· 510 commits to main since this release
4780b39

🌟 Highlights

  • 🛠 Data Contamination Analysis: A novel feature for analyzing and ensuring the integrity of dataset inputs.
  • 🧠 Enhanced Subjective Evaluation: Implementation of a new subjective judgement system, providing more nuanced and accurate evaluations.
  • 🚀 Chat Style Inferencer Support: Introduction of a new chat style inferencer, enhancing interactive capabilities.
  • 🌐 Multilingual Features: Expansion to support Chinese versions of commonsenseqa, crowspairs, and nq datasets.
  • 📊 New Datasets Integration: Addition of wikibench, rolebench, and updated versions of gsm8k and MathBench datasets for broader research applications.
  • 🛠 Enhancements and Bug Fixes: Numerous improvements including a new subjective judgement system and updates in MathBench CodeInterpreter.
  • 📝 Documentation and API Updates: Comprehensive updates to README and API interfaces for better user guidance and experience.

🚀 New Features & Enhancements

  • Support for chat style inferencer, offering a more dynamic interaction model (#643).
  • Addition of Chinese versions for key datasets: commonsenseqa, crowspairs, and nq (#144).
  • Introduction of the wikibench dataset, providing a new benchmark for knowledge-based tasks (#655).
  • Updated gsm8k and MathBench configurations for enhanced performance and accuracy (#652, #657).
  • Addition of rolebench dataset, expanding the range of evaluative scenarios (#633).
  • Implementation of new subjective judgement criteria for improved assessment accuracy (#660).
  • Integration of advanced models like qwen-1.8b/72b and deepseek-7b/67b in the platform's configuration (#672).
  • Launch of Data Contamination Analysis as a new feature, enhancing data integrity checks (#639).

🛠 Improvements & Fixes

  • Removal of colossalai dependency to streamline operations (#645).
  • Resolution of various bugs including hellaswag_ppl_47bff9 and standard deviation summarizer issues (#648, #675).
  • Update and fix of the MathBench CodeInterpreter and related bugs (#657).
  • Enhancement of API interface for improved functionality and user experience (#681).

📚 Documentation Updates

  • Updated README for clearer guidance and information (#682).
  • Documentation and docstring updates for accuracy and comprehensiveness (#684).

🎊 New Contributors

🔗 Full Changelog

Thank you to all contributors for your hard work and dedication. OpenCompass v0.2.0 marks another step forward in our journey, bringing enhanced features and capabilities to the community. Let's continue to innovate and expand the horizons of OpenCompass! 🎉🌐💡