🤖 AI Summary
Deep learning models for tabular data suffer from slow convergence, high hyperparameter sensitivity, and unstable early-stage training. To address these challenges, this paper proposes a Random Fourier Features (RFF)-based preprocessing method guided by Neural Tangent Kernel (NTK) theory. The method non-linearly maps原始 features into a fixed-frequency domain, yielding a parameter- and architecture-agnostic plug-and-play module. We theoretically establish that it effectively constrains the initial NTK spectrum and introduces beneficial gradient flow bias, thereby accelerating optimization dynamics. Empirical evaluation across multiple standard tabular benchmarks demonstrates substantial improvements: training iterations are reduced by 30–50% to achieve comparable performance, hyperparameter tuning becomes significantly less sensitive, and generalization capability is enhanced. The approach thus offers a principled, lightweight, and broadly applicable solution for improving the training efficiency and robustness of deep tabular models.
📝 Abstract
While random Fourier features are a classic tool in kernel methods, their utility as a pre-processing step for deep learning on tabular data has been largely overlooked. Motivated by shortcomings in tabular deep learning pipelines - revealed through Neural Tangent Kernel (NTK) analysis - we revisit and repurpose random Fourier mappings as a parameter-free, architecture-agnostic transformation. By projecting each input into a fixed feature space via sine and cosine projections with frequencies drawn once at initialization, this approach circumvents the need for ad hoc normalization or additional learnable embeddings. We show within the NTK framework that this mapping (i) bounds and conditions the network's initial NTK spectrum, and (ii) introduces a bias that shortens the optimization trajectory, thereby accelerating gradient-based training. These effects pre-condition the network with a stable kernel from the outset. Empirically, we demonstrate that deep networks trained on Fourier-transformed inputs converge more rapidly and consistently achieve strong final performance, often with fewer epochs and less hyperparameter tuning. Our findings establish random Fourier pre-processing as a theoretically motivated, plug-and-play enhancement for tabular deep learning.