SPIO: Ensemble and Selective Strategies via LLM-Based Multi-Agent Planning in Automated Data Science

πŸ“… 2025-03-30
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing AutoML pipelines rely on rigid, single-path workflows, limiting strategy exploration and flexibility, thereby constraining predictive performance. To address this, we propose SPIOβ€”a novel LLM-driven, multi-agent collaborative AutoML framework that dynamically orchestrates preprocessing, feature engineering, modeling, and hyperparameter optimization via modular strategy generation, sequential plan integration and refinement, k-best ensemble construction, and LLM-based meta-evaluation for component selection. SPIO introduces two complementary paradigms: SPIO-S (single-path selection), which leverages LLMs to identify the optimal unified pipeline, and SPIO-E (ensemble-based multi-path integration), which fuses diverse high-performing pipelines. Evaluated across multiple Kaggle and OpenML benchmarks, SPIO consistently outperforms state-of-the-art AutoML systems in both accuracy and robustness, while demonstrating strong scalability and generalization.

Technology Category

Application Category

πŸ“ Abstract
Large Language Models (LLMs) have revolutionized automated data analytics and machine learning by enabling dynamic reasoning and adaptability. While recent approaches have advanced multi-stage pipelines through multi-agent systems, they typically rely on rigid, single-path workflows that limit the exploration and integration of diverse strategies, often resulting in suboptimal predictions. To address these challenges, we propose SPIO (Sequential Plan Integration and Optimization), a novel framework that leverages LLM-driven decision-making to orchestrate multi-agent planning across four key modules: data preprocessing, feature engineering, modeling, and hyperparameter tuning. In each module, dedicated planning agents independently generate candidate strategies that cascade into subsequent stages, fostering comprehensive exploration. A plan optimization agent refines these strategies by suggesting several optimized plans. We further introduce two variants: SPIO-S, which selects a single best solution path as determined by the LLM, and SPIO-E, which selects the top k candidate plans and ensembles them to maximize predictive performance. Extensive experiments on Kaggle and OpenML datasets demonstrate that SPIO significantly outperforms state-of-the-art methods, providing a robust and scalable solution for automated data science task.
Problem

Research questions and friction points this paper is trying to address.

Overcoming rigid single-path workflows in automated data science
Enhancing diverse strategy exploration via multi-agent planning
Optimizing predictive performance through ensemble and selective strategies
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-driven multi-agent planning for diverse strategies
Four-stage optimization: data, features, model, tuning
Two variants: single-path selection and ensemble top-k
πŸ”Ž Similar Papers
No similar papers found.