🤖 AI Summary
This work addresses the challenges of automated feature engineering for tabular data, where reliance on domain expertise and an enormous search space hinder scalability. To overcome these limitations, the study introduces the ReAct agent framework into automated feature engineering for the first time, presenting an end-to-end system that integrates large language models, a contextual memory mechanism, and feature augmentation and selection algorithms. This approach enables autonomous generation, optimization, and evaluation of features for both classification and regression tasks. Extensive benchmark evaluations demonstrate that the method achieves state-of-the-art performance, yielding an average improvement of 0.23% in ROC-AUC for classification and a 2.0% reduction in RMSE for regression, while also exhibiting enhanced robustness across diverse datasets.
📝 Abstract
Feature engineering remains a critical yet challenging bottleneck in machine learning, particularly for tabular data, as identifying optimal features from an exponentially large feature space traditionally demands substantial domain expertise. To address this challenge, we introduce FAMOSE (Feature AugMentation and Optimal Selection agEnt), a novel framework that leverages the ReAct paradigm to autonomously explore, generate, and refine features while integrating feature selection and evaluation tools within an agent architecture. To our knowledge, FAMOSE represents the first application of an agentic ReAct framework to automated feature engineering, especially for both regression and classification tasks. Extensive experiments demonstrate that FAMOSE is at or near the state-of-the-art on classification tasks (especially tasks with more than 10K instances, where ROC-AUC increases 0.23% on average), and achieves the state-of-the-art for regression tasks by reducing RMSE by 2.0% on average, while remaining more robust to errors than other algorithms. We hypothesize that FAMOSE's strong performance is because ReAct allows the LLM context window to record (via iterative feature discovery and evaluation steps) what features did or did not work. This is similar to a few-shot prompt and guides the LLM to invent better, more innovative features. Our work offers evidence that AI agents are remarkably effective in solving problems that require highly inventive solutions, such as feature engineering.