Commit 72be960
Add hipho physics dataset (#1318)
* Initial commit fot HiPhO
* update
* update
* refactor: simplify HiPhO dataset logging system
- Remove custom LogBuffer class and thread-safe logging
- Replace safe_print with standard print statements
- Remove threading and datetime imports
- Simplify build_prompt function by removing verbose debug output
- Update dataset URL from haiyuanwan/HiPhO to HY-Wan/HiPhO
- Reduce code from 899 to 803 lines (10.7% reduction)
- Maintain all core functionality: evaluation logic, prompt building, hipho_verifier integration
* refactor: remove parallel evaluation framework from HiPhO dataset
- Remove complex parallel evaluation using track_progress_rich
- Simplify to sequential evaluation for better stability and debugging
- Remove multiprocessing and parallel task management dependencies
- Rename functions to remove '_with_buffer' suffix and log_buffer parameters
- Remove nproc parameter handling and temporary file management
- Reduce code from 803 to 774 lines (additional 3.6% reduction)
- Maintain all core evaluation logic: fine/coarse-grained scoring, hipho_verifier integration
- Sequential evaluation is sufficient for physics olympiad problem counts
* refactor: major simplification of HiPhO dataset implementation
Major improvements:
- Remove 6 unnecessary try-except blocks that were hiding errors
- Standardize judge model initialization to follow VLMEvalKit conventions
- Move all prompt templates to utils/prompt_inference.py for better organization
- Remove redundant count statistics (fine_grained_count, coarse_grained_count, total_count)
- Remove unused fallback functions (_simple_answer_matching, _extract_prediction_for_display)
- Fix multi-image base64 processing bug
- Correct dataset name display in summary output
- Remove verbose debugging output and unnecessary comments
Code reduction: 899 → 604 lines (32.8% reduction)
Eliminated potential bugs and improved maintainability while preserving all core functionality
* Improve HiPhO dataset: translate comments to English and enhance configuration
- Translate all Chinese comments to English in hipho.py, hipho_verifier.py, and prompt_inference.py
- Simplify comments while maintaining technical accuracy
- Replace hardcoded verifier model configuration with environment variables
- Use VLMEvalKit standard environment variable approach for better flexibility
- Add support for HIPHO_VERIFIER_* environment variables for model configuration
- Improve code maintainability and international accessibility
* Add new dependencies for HiPhO dataset functionality
- Add datasets: for HuggingFace dataset loading
- Add scikit-learn: for machine learning utilities
- Add pylatexenc==2.10: for LaTeX text processing
- Add math-verify: for mathematical answer verification
These dependencies are required for the HiPhO physics olympiad dataset
evaluation and verification functionality.
* Add hipho_prompt_inference.py utility file
* Update import statement for prompt inference module
---------
Co-authored-by: Haodong Duan <[email protected]>
Co-authored-by: Ma Zerun <[email protected]>1 parent 8dc65e0 commit 72be960
File tree
5 files changed
+2078
-1
lines changed- vlmeval/dataset
- utils
5 files changed
+2078
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
| 2 | + | |
2 | 3 | | |
3 | 4 | | |
4 | 5 | | |
| |||
8 | 9 | | |
9 | 10 | | |
10 | 11 | | |
| 12 | + | |
11 | 13 | | |
12 | 14 | | |
13 | 15 | | |
| |||
19 | 21 | | |
20 | 22 | | |
21 | 23 | | |
| 24 | + | |
22 | 25 | | |
23 | 26 | | |
24 | 27 | | |
25 | 28 | | |
| 29 | + | |
26 | 30 | | |
27 | 31 | | |
28 | 32 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
93 | 93 | | |
94 | 94 | | |
95 | 95 | | |
| 96 | + | |
96 | 97 | | |
97 | 98 | | |
98 | 99 | | |
| |||
223 | 224 | | |
224 | 225 | | |
225 | 226 | | |
226 | | - | |
| 227 | + | |
227 | 228 | | |
228 | 229 | | |
229 | 230 | | |
| |||
0 commit comments