Towards Data-Centric AI: A Comprehensive Survey of Traditional, Reinforcement, and Generative Approaches for Tabular Data Transformation

📅 2025-01-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing approaches for tabular data processing in finance, healthcare, and advertising struggle with reliable key feature identification and high-fidelity synthetic feature generation. Method: This paper presents a systematic survey of data-centering AI paradigms for tabular data transformation, unifying— for the first time—diverse methodologies including classical statistical filtering, embedding-based learning, reinforcement learning–driven feature search, and generative adversarial network (GAN) or large language model (LLM)–based feature synthesis. We propose the novel “data-space refinement” paradigm. Contribution/Results: We establish a structured taxonomy that clarifies applicability boundaries and inherent limitations of each approach; identify core open challenges in jointly optimizing feature selection and generation; and deliver a methodology guide bridging theoretical rigor and industrial deployability for enterprise-scale tabular data governance.

Technology Category

Application Category

📝 Abstract
Tabular data is one of the most widely used formats across industries, driving critical applications in areas such as finance, healthcare, and marketing. In the era of data-centric AI, improving data quality and representation has become essential for enhancing model performance, particularly in applications centered around tabular data. This survey examines the key aspects of tabular data-centric AI, emphasizing feature selection and feature generation as essential techniques for data space refinement. We provide a systematic review of feature selection methods, which identify and retain the most relevant data attributes, and feature generation approaches, which create new features to simplify the capture of complex data patterns. This survey offers a comprehensive overview of current methodologies through an analysis of recent advancements, practical applications, and the strengths and limitations of these techniques. Finally, we outline open challenges and suggest future perspectives to inspire continued innovation in this field.
Problem

Research questions and friction points this paper is trying to address.

Table Data Optimization
AI Performance Enhancement
Data Selection and Generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement Learning
Generative Methods
Data Optimization
🔎 Similar Papers
No similar papers found.