microsoft/RD-Agent

每日信息看板 · 2026-02-14

返回当天 Daily Index

开源项目

AI 总结

微软开源RD-Agent并持续发布量化与数据科学扩展，其在MLE-bench上取得当前公开领先成绩，显示多智能体自动化研发在工业级机器学习与量化策略中的实用价值。

项目定位为数据驱动的多智能体R&D框架，以“R(研究想法)+D(工程实现)”协同自动化研发流程。
官方公布在75个Kaggle任务构成的MLE-bench上处于领先，All指标约30.22%，高于此前公开基线AIDE。
发布RD-Agent(Q)量化版本，宣称在真实股市实验中以低成本获得更高ARR并减少因子数量。
生态完善：提供Live Demo、视频、文档、技术报告、PyPI发行与完整CI质量体系。
支持LiteLLM作为默认后端并提供DeepSeek等配置，覆盖ChatCompletion、json_mode与embedding能力。

#GitHub #repo #开源项目 #RD-Agent #MLE-bench #LiteLLM #Agent

原链接

内容摘录

<h4 align="center">
 <img src="docs/_static/logo.png" alt="RA-Agent logo" style="width:70%; ">
 
 <a href="https://rdagent.azurewebsites.net" target="_blank">🖥️ Live Demo</a> |
 <a href="https://rdagent.azurewebsites.net/factor_loop" target="_blank">🎥 Demo Video</a> <a href="https://www.youtube.com/watch?v=JJ4JYO3HscM&list=PLALmKB0_N3_i52fhUmPQiL4jsO354uopR" target="_blank">▶️YouTube</a> |
 <a href="https://rdagent.readthedocs.io/en/latest/index.html" target="_blank">📖 Documentation</a> |
 <a href="https://aka.ms/RD-Agent-Tech-Report" target="_blank">📄 Tech Report</a> |
 <a href="#-paperwork-list"> 📃 Papers </a>
</h3>

CI
CodeQL
Dependabot Updates
Lint PR Title
Release.yml
Platform
PyPI
PyPI - Python Version
Release
GitHub
pre-commit
Checked with mypy
Ruff
Chat
Documentation Status
Readthedocs Preview <!-- this badge is too long, please place it in the last one to make it pretty --> 
arXiv
📰 News
| 🗞️ News | 📝 Description |
| -- | ------ |
| NeurIPS 2025 Acceptance | We are thrilled to announce that our paper R&D-Agent-Quant has been accepted to NeurIPS 2025 | 
| Technical Report Release | Overall framework description and results on MLE-bench | 
| R&D-Agent-Quant Release | Apply R&D-Agent to quant trading | 
| MLE-Bench Results Released | R&D-Agent currently leads as the top-performing machine learning engineering agent on MLE-bench |
| Support LiteLLM Backend | We now fully support **LiteLLM** as our default backend for integration with multiple LLM providers. |
| General Data Science Agent | Data Science Agent |
| Kaggle Scenario release | We release **Kaggle Agent**, try the new features! |
| Official WeChat group release | We created a WeChat group, welcome to join! (🗪QR Code) |
| Official Discord release | We launch our first chatting channel in Discord (🗪Chat) |
| First release | **R&D-Agent** is released on GitHub |
🏆 The Best Machine Learning Engineering Agent!

MLE-bench is a comprehensive benchmark evaluating the performance of AI agents on machine learning engineering tasks. Utilizing datasets from 75 Kaggle competitions, MLE-bench provides robust assessments of AI systems' capabilities in real-world ML engineering scenarios.

R&D-Agent currently leads as the top-performing machine learning engineering agent on MLE-bench:

| Agent | Low == Lite (%) | Medium (%) | High (%) | All (%) |
|---------|--------|-----------|---------|----------|
| R&D-Agent o3(R)+GPT-4.1(D) | 51.52 ± 6.9 | 19.3 ± 5.5 | 26.67 ± 0 | 30.22 ± 1.5 |
| R&D-Agent o1-preview | 48.18 ± 2.49 | 8.95 ± 2.36 | 18.67 ± 2.98 | 22.4 ± 1.1 |
| AIDE o1-preview | 34.3 ± 2.4 | 8.8 ± 1.1 | 10.0 ± 1.9 | 16.9 ± 1.1 |

**Notes:**
**O3(R)+GPT-4.1(D)**: This version is designed to both reduce average time per loop and leverage a cost-effective combination of backend LLMs by seamlessly integrating Research Agent (o3) with Development Agent (GPT-4.1).
**AIDE o1-preview**: Represents the previously best public result on MLE-bench as reported in the original MLE-bench paper.
Average and standard deviation results for R&D-Agent o1-preview is based on a independent of 5 seeds and for R&D-Agent o3(R)+GPT-4.1(D) is based on 6 seeds.
According to MLE-Bench, the 75 competitions are categorized into three levels of complexity: **Low==Lite** if we estimate that an experienced ML engineer can produce a sensible solution in under 2 hours, excluding the time taken to train any models; **Medium** if it takes between 2 and 10 hours; and **High** if it takes more than 10 hours.

You can inspect the detailed runs of the above results online.
R&D-Agent o1-preview detailed runs
R&D-Agent o3(R)+GPT-4.1(D) detailed runs

For running R&D-Agent on MLE-bench, refer to **MLE-bench Guide: Running ML Engineering via MLE-bench**
🥇 The First Data-Centric Quant Multi-Agent Framework!

R&D-Agent for Quantitative Finance, in short **RD-Agent(Q)**, is the first data-centric, multi-agent framework designed to automate the full-stack research and development of quantitative strategies via coordinated factor-model co-optimization.

!image

Extensive experiments in real stock markets show that, at a cost under $10, RD-Agent(Q) achieves approximately 2× higher ARR than benchmark factor libraries while using over 70% fewer factors. It also surpasses state-of-the-art deep time-series models under smaller resource budgets. Its alternating factor–model optimization further delivers excellent trade-off between predictive accuracy and strategy robustness.

You can learn more details about **RD-Agent(Q)** through the paper and reproduce it through the documentation.
Data Science Agent Preview
Check out our demo video showcasing the current progress of our Data Science Agent under development:

https://github.com/user-attachments/assets/3eccbecb-34a4-4c81-bce4-d3f8862f7305
🌟 Introduction
<div align="center">
 <img src="docs/_static/scen.png" alt="Our focused scenario" style="width:80%; ">
</div>

R&D-Agent aims to automate the most critical and valuable aspects of the industrial R&D process, and we begin with focusing on the data-driven scenarios to streamline the development of models and data. 
Methodologically, we have identified a framework with two key components: 'R' for proposing new ideas and 'D' for implementing them.
We believe that the automatic evolution of R&D will lead to solutions of significant industrial value.

<!-- Tag Cloud -->
R&D is a very general scenario. The advent of R&D-Agent can be your
💰 **Automatic Quant Factory** (🎥Demo Video|▶️YouTube)
🤖 **Data Mining Agent:** Iteratively proposing data & models (🎥Demo Video 1|▶️YouTube) (🎥Demo Video 2|▶️YouTube) and implementing them by gaining knowledge from data.
🦾 **Research Copilot:** Auto read research papers (🎥Demo Video|▶️YouTube) / financial reports (🎥Demo Video|▶️YouTube) and implement model structures or building datasets.
🤖 **Kaggle Agent:** Auto Model Tuning and Feature Engineering([🎥Demo Video Coming Soon...]()) and implementing them to achieve more in competitions.
...

You can click the links…