🤖 AI Summary
This paper addresses the empirical effectiveness yet theoretical obscurity of Hadamard product (HP) interactions in click-through rate (CTR) prediction. From a quadratic neural network (QNN) perspective, we reveal that HP implicitly performs high-dimensional feature mapping and smooth, activation-free nonlinear feature interaction. We propose QNN-α, a novel neuron architecture; introduce multi-head Khatri-Rao product as a theoretically grounded replacement for HP to enhance expressive power of feature interactions; design Self-Ensemble Loss for dynamic model self-ensembling; and provide theory-driven justification that mid-activation interaction outperforms post-activation. Evaluated on six public CTR benchmark datasets, our method achieves state-of-the-art (SOTA) performance while maintaining low inference latency, strong scalability, and seamless compatibility with existing deep CTR models. The implementation, including source code and hyperparameters, is fully open-sourced.
📝 Abstract
Hadamard Product (HP) has long been a cornerstone in click-through rate (CTR) prediction tasks due to its simplicity, effectiveness, and ability to capture feature interactions without additional parameters. However, the underlying reasons for its effectiveness remain unclear. In this paper, we revisit HP from the perspective of Quadratic Neural Networks (QNN), which leverage quadratic interaction terms to model complex feature relationships. We further reveal QNN's ability to expand the feature space and provide smooth nonlinear approximations without relying on activation functions. Meanwhile, we find that traditional post-activation does not further improve the performance of the QNN. Instead, mid-activation is a more suitable alternative. Through theoretical analysis and empirical evaluation of 25 QNN neuron formats, we identify a good-performing variant and make further enhancements on it. Specifically, we propose the Multi-Head Khatri-Rao Product as a superior alternative to HP and a Self-Ensemble Loss with dynamic ensemble capability within the same network to enhance computational efficiency and performance. Ultimately, we propose a novel neuron format, QNN-alpha, which is tailored for CTR prediction tasks. Experimental results show that QNN-alpha achieves new state-of-the-art performance on six public datasets while maintaining low inference latency, good scalability, and excellent compatibility. The code, running logs, and detailed hyperparameter configurations are available at: https://github.com/salmon1802/QNN.