Skip to content

OpenCompass v0.2.1

Compare
Choose a tag to compare
@bittersweet1999 bittersweet1999 released this 08 Jan 14:57
· 458 commits to main since this release
a74e4c1

We're thrilled to announce OpenCompass v0.2.1, loaded with new datasets, features, and vital fixes. This release is a testament to our ongoing commitment to enhancing user experience and broadening research capabilities.

🌟 Highlights:

  • Add Agent and Code datasets: Diverse new datasets like GPQA, mastermath2024v1, and more, significantly expanding the scope of OpenCompass.
  • Support Different JudgeLLM Subjective Evaluation: Providing more choice when choose judgellms.
  • Support Needle in Haystack: Support Needle in Haystack for longtext evaluation.
  • Add VLLM Evaluation: We support VLLM inference and evaluation.

Here's what's new:

🚀 New Features:

  • 📦 Dataset Expansion:

    • Added rwkv-5-3b model (#666)
    • Integration of diverse datasets including GPQA, Creationbench, and more.
    • Support for new datasets like mastermath2024v1, mbpp_plus, and sanitized_mbpp (#744, #770, #745)
  • 🛠 Functional Enhancements:

    • Subjective evaluation improvements (#692, #724)
    • Updated python action, slurm, and docker docs (#694, #718)
    • Turbomind API support and Qwen API integration (#693, #735)
  • 📖 Documentation Updates:

    • Updated contamination, alignmentbench, and other docs for better clarity (#698, #707)
    • Fixed dead links and typos in various documents (#455, #773, #774)

🐛 Bug Fixes:

  • Addressed various issues including those in alignmentbench, configs, and postprocess scripts.
  • Fixed bugs concerning subjective evaluation and EOS string detection.
  • Quick fixes for improved performance and reliability.

🎉 Welcome New Contributors:

🔗 Full Changelog

For a full list of updates, visit our Full Changelog.

Thank you to every contributor, old and new. Your dedication is shaping OpenCompass into a more robust and versatile tool. 🙌 🎉


Remember to star 🌟 our GitHub repository if OpenCompass aids your research and development! Your support and feedback are crucial for our continuous improvement.