🤖 AI Summary
This work addresses the usability bottleneck of automated feature engineering (AutoFE) for tabular data. We conduct the first empirical evaluation of 53 mainstream AutoFE methods, assessing practical dimensions including ease of deployment, documentation completeness, community activity, and support for user-defined resource constraints (e.g., time and memory budgets). Results reveal that most tools suffer from complex installation, incomplete or outdated documentation, inactive maintenance, and—critically—lack configurable resource limits, severely hindering industrial adoption. To bridge this gap, we propose a “high-usability”-oriented AutoFE evaluation framework that prioritizes resource controllability and end-user friendliness in engineering design. Our findings expose a substantial practicality gap in current AutoFE tooling, providing empirical evidence and concrete guidance for developing next-generation AutoFE systems that are trustworthy, configurable, and sustainably maintainable.
📝 Abstract
Tabular data, consisting of rows and columns, is omnipresent across various machine learning applications. Each column represents a feature, and features can be combined or transformed to create new, more informative features. Such feature engineering is essential to achieve peak performance in machine learning. Since manual feature engineering is expensive and time-consuming, a substantial effort has been put into automating it. Yet, existing automated feature engineering (AutoFE) methods have never been investigated regarding their usability for practitioners. Thus, we investigated 53 AutoFE methods. We found that these methods are, in general, hard to use, lack documentation, and have no active communities. Furthermore, no method allows users to set time and memory constraints, which we see as a necessity for usable automation. Our survey highlights the need for future work on usable, well-engineered AutoFE methods.