Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitattributes
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
*.csv filter=lfs diff=lfs merge=lfs -text
mlebench/competitions/*/top_solutions/** filter=lfs diff=lfs merge=lfs -text
runs/**/*.json filter=lfs diff=lfs merge=lfs -text
runs/**/*.jsonl filter=lfs diff=lfs merge=lfs -text
runs/**/*.jsonl filter=lfs diff=lfs merge=lfs -text
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ Code for the paper ["MLE-Bench: Evaluating Machine Learning Agents on Machine Le

| Agent | LLM(s) used | Low == Lite (%) | Medium (%) | High (%) | All (%) | Running Time (hours) | Date | Grading Reports Available | Source Code Available |
|-------|-------------|-----------------|------------|----------|---------|----------------------|------|---------------------------|----------------------|
| FM Agent | Gemini-2.5-Pro | 62.12 ± 3.03 | 36.84 ± 2.63 | 33.33 ± 0 | 43.56 ± 1.78 | 24 | 2025-10-10 | ✓ | X |
| [Operand](https://operand.com) ensemble | gpt-5 (low verbosity/effort)[^1] | 63.64 ± 5.92 | 33.33 ± 4.42 | 20.00 ± 5.96 | 39.56 ± 3.26 | 24 | 2025-10-06 | ✓ | X |
| [InternAgent](https://github.com/Alpha-Innovator/InternAgent/) | deepseek-r1 | 62.12 ± 3.03 | 26.32 ± 2.63 | 24.44 ± 2.22| 36.44 ± 1.18 | 12 | 2025-09-12 | ✓ | X |
| [R&D-Agent](https://github.com/microsoft/RD-Agent) | gpt-5 | 68.18 ± 2.62 | 21.05 ± 1.52 | 22.22 ± 2.22 | 35.11 ± 0.44 | 12 | 2025-09-26 | ✓ | ✓ |
Expand Down
3 changes: 2 additions & 1 deletion runs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,4 +43,5 @@ table below.
| o3-gpt-4.1-R&D-Agent | O3 as researcher and gpt-4.1 as developer on R&D-Agent scaffolding, 12 vCPUs, 220GB of RAM, and 1 V100 GPU |
| deepseek-r1-InternAgent | Deepseek-R1 on InternAgent scaffolding, 12 hours, 32 vCPUs, 230GB of RAM, and 1 A800 GPU|
| gpt-5-R&D-Agent | gpt-5 on R&D-Agent scaffolding, 12 hours, 12 vCPUs, 220GB of RAM, and 1 V100 GPU |
| operand-ensemble | Operand Ensemble, 24 hours, 36 vCPUs, 440GB of RAM, and 1 A10 GPU |
| operand-ensemble | Operand Ensemble, 24 hours, 36 vCPUs, 440GB of RAM, and 1 A10 GPU |
| FM Agent | FM Agent, 24 hours, 64 vCPUs, 500GB of RAM, and 1 A800 GPU |
3 changes: 3 additions & 0 deletions runs/fmagent_group1/grading_report_group_1.json
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these need to be LFS files

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for your reminder. It has now been corrected!

Git LFS file not shown
3 changes: 3 additions & 0 deletions runs/fmagent_group2/grading_report_group_2.json
Git LFS file not shown
3 changes: 3 additions & 0 deletions runs/fmagent_group3/grading_report_group_3.json
Git LFS file not shown
4 changes: 2 additions & 2 deletions runs/run_group_experiments.csv
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's keep this as an LFS file as before

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for your reminder. It has now been corrected!

Git LFS file not shown