openai · thesofakillers · Oct 17, 2025 · Oct 10, 2025 · Oct 14, 2025 · Oct 17, 2025
diff --git a/.gitattributes b/.gitattributes
@@ -1,4 +1,4 @@
 *.csv filter=lfs diff=lfs merge=lfs -text
 mlebench/competitions/*/top_solutions/** filter=lfs diff=lfs merge=lfs -text
 runs/**/*.json filter=lfs diff=lfs merge=lfs -text
-runs/**/*.jsonl filter=lfs diff=lfs merge=lfs -text
+runs/**/*.jsonl filter=lfs diff=lfs merge=lfs -text
diff --git a/README.md b/README.md
@@ -6,6 +6,7 @@ Code for the paper ["MLE-Bench: Evaluating Machine Learning Agents on Machine Le
 
 | Agent | LLM(s) used | Low == Lite (%) | Medium (%) | High (%) | All (%) | Running Time (hours) | Date | Grading Reports Available | Source Code Available |
 |-------|-------------|-----------------|------------|----------|---------|----------------------|------|---------------------------|----------------------|
+| FM Agent | Gemini-2.5-Pro | 62.12 ± 3.03 | 36.84 ± 2.63 | 33.33 ± 0 | 43.56 ± 1.78 | 24 | 2025-10-10 | ✓ | X |
 | [Operand](https://operand.com) ensemble | gpt-5 (low verbosity/effort)[^1] | 63.64 ± 5.92 | 33.33 ± 4.42 | 20.00 ± 5.96 | 39.56 ± 3.26 | 24 | 2025-10-06 | ✓ | X |
 | [InternAgent](https://github.com/Alpha-Innovator/InternAgent/) | deepseek-r1 | 62.12 ± 3.03 | 26.32 ± 2.63 | 24.44 ± 2.22| 36.44 ± 1.18 | 12 | 2025-09-12 | ✓ | X |
 | [R&D-Agent](https://github.com/microsoft/RD-Agent) | gpt-5 | 68.18 ± 2.62 | 21.05 ± 1.52 | 22.22 ± 2.22 | 35.11 ± 0.44 | 12 | 2025-09-26 | ✓ | ✓ |

diff --git a/runs/README.md b/runs/README.md
@@ -43,4 +43,5 @@ table below.
 | o3-gpt-4.1-R&D-Agent                | O3 as researcher and gpt-4.1 as developer on R&D-Agent scaffolding, 12 vCPUs, 220GB of RAM, and 1 V100 GPU |
 | deepseek-r1-InternAgent             | Deepseek-R1 on InternAgent scaffolding, 12 hours, 32 vCPUs, 230GB of RAM, and 1 A800 GPU|
 | gpt-5-R&D-Agent                     | gpt-5 on R&D-Agent scaffolding, 12 hours, 12 vCPUs, 220GB of RAM, and 1 V100 GPU |
-| operand-ensemble                    | Operand Ensemble, 24 hours, 36 vCPUs, 440GB of RAM, and 1 A10 GPU          |
+| operand-ensemble                    | Operand Ensemble, 24 hours, 36 vCPUs, 440GB of RAM, and 1 A10 GPU          |
+| FM Agent                    | FM Agent, 24 hours, 64 vCPUs, 500GB of RAM, and 1 A800 GPU          |
diff --git a/runs/fmagent_group1/grading_report_group_1.json b/runs/fmagent_group1/grading_report_group_1.json
diff --git a/runs/fmagent_group2/grading_report_group_2.json b/runs/fmagent_group2/grading_report_group_2.json
diff --git a/runs/fmagent_group3/grading_report_group_3.json b/runs/fmagent_group3/grading_report_group_3.json
diff --git a/runs/run_group_experiments.csv b/runs/run_group_experiments.csv