-
Notifications
You must be signed in to change notification settings - Fork 153
FM Agent MLE-Benchmark Results #80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi FM Agent team (cc @GZL11), thank you for the submission. Could you clarify the +/- 0 on the high split? Seems surprisingly precise! |
Hi authors of MLE-Bench(cc @thesofakillers),
Interestingly, we found that these competitions did not require extensive time or iterative tuning—many were solved within 12 hours, and none exceeded 24 hours to reach medal performance. In contrast, for the remaining competitions in the high split, we were unable to achieve medal-level results within the 24-hour time limit, though we did obtain reasonably competitive scores. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying about the +/- 0, that sounds reasonable to me.
I think this looks all correct, but we need the grading reports and run_group_experiments.csv to be tracked with git LFS please. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these need to be LFS files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for your reminder. It has now been corrected!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's keep this as an LFS file as before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for your reminder. It has now been corrected!
Thank you very much for your reminder. I sincerely apologize for not noticing this point when uploading previously. It has now been corrected! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! congrats
no worries! |
Hello authors of MLE-Bench,
We are the FM Agent team from Baidu, and we are pleased to share that our FM Agent has achieved SOTA performance on the MLE-Bench benchmark.
Over recent months, we have developed an advanced agent based on the FM Agent framework that can systematically analyze problems and iteratively refine solutions to address complex end-to-end tasks, including various machine learning workloads.
To validate the workings of FM Agent, we conducted extensive experiments on MLE-Bench to rigorously validate our agent’s performance.
As part of this pull request, we are contributing:
Resources Used:
Final proposed new result:
Our technical report—containing more detailed insights into our work—is coming soon! We are truly grateful for the chance to contribute to MLE-Bench, with the sincere hope that our findings will bring value to the broader community. We also eagerly anticipate your feedback and are committed to advancing the merging of this pull request promptly.
Best regards,
The FM Agent Team (Baidu)