🤖 AI Summary
This work addresses the low research efficiency and poor reproducibility prevalent in machine learning (ML) studies by proposing the first LLM-based autonomous scientific research framework. Methodologically, it introduces a three-stage collaborative paradigm—comprising IdeaAgent, ExperimentAgent, and ValidateAgent—integrating retrieval-augmented generation (RAG), dynamic code synthesis, experimental workflow orchestration, and secure sandboxed execution. The framework fully automates the ML research lifecycle: from hypothesis generation and adaptive selection of models/datasets to prototype code generation and closed-loop validation. Its key contribution lies in establishing the first human-in-the-loop, end-to-end ML research pipeline enabled by LLM agents, requiring no manual coding intervention. Evaluated across five representative ML tasks, the framework successfully generated and executed valid experiments, significantly improving research productivity and result reproducibility. This demonstrates the feasibility of LLM agents in driving substantive, reproducible scientific innovation in ML.
📝 Abstract
Machine learning research, crucial for technological advancements and innovation, often faces significant challenges due to its inherent complexity, slow pace of experimentation, and the necessity for specialized expertise. Motivated by this, we present a new systematic framework, autonomous Machine Learning Research with large language models (MLR-Copilot), designed to enhance machine learning research productivity through the automatic generation and implementation of research ideas using Large Language Model (LLM) agents. The framework consists of three phases: research idea generation, experiment implementation, and implementation execution. First, existing research papers are used to generate hypotheses and experimental plans vis IdeaAgent powered by LLMs. Next, the implementation generation phase translates these plans into executables with ExperimentAgent. This phase leverages retrieved prototype code and optionally retrieves candidate models and data. Finally, the execution phase, also managed by ExperimentAgent, involves running experiments with mechanisms for human feedback and iterative debugging to enhance the likelihood of achieving executable research outcomes. We evaluate our framework on five machine learning research tasks and the experimental results show the framework's potential to facilitate the research progress and innovations.