Dual-Stream MLP is All You Need for CTR Prediction

📅 2026-06-03

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

This work addresses the high complexity of feature interaction learning in click-through rate (CTR) prediction and the imbalance between outputs of explicit and implicit interaction modules. To this end, the authors propose a dual-stream MLP framework, wherein a backbone MLP absorbs knowledge from an explicit interaction module via knowledge distillation, while a parallel MLP captures implicit interactions. The two streams are jointly optimized through a dual-alignment strategy. Notably, this approach is the first to reduce a dual-stream architecture to a pure MLP structure, effectively curbing overfitting while lowering model complexity. The method achieves state-of-the-art performance on three mainstream CTR benchmark datasets, offering an efficient and scalable solution for recommender systems.

📝 Abstract

Click-through rate (CTR) prediction holds a pivotal role in online advertising and recommendation systems, where even small improvements can significantly boost revenue. Existing research primarily focuses on designing dual-stream architectures to capture effective complex feature interactions from both explicit and implicit perspectives. However, these approaches are faced with two major challenges: 1) the high complexity of feature interaction learning, which increases computational demands and the overfitting risk, and 2) the imbalance between explicit and implicit modules, where one module's output may dominate the final prediction. To address these issues, in this paper, we propose Dual-Stream MLP (DS-MLP), a novel feature interaction framework for the CTR prediction task. Specially, it leverages knowledge distillation to consolidate the capacity of learning explicit feature interaction into a main MLP network, while a parallel MLP simultaneously captures implicit feature interactions as a complement. To effectively optimize the dual-stream MLP architecture, we further design a specific learning approach with two alignment strategies for enhancing the compatibility of the two MLP components. Experiments demonstrate that DS-MLP, though merely a vanilla MLP structure (the final model), can achieve state-of-the-art performance across three widely used benchmarks, offering a scalable and efficient solution for large-scale recommendation systems. Our code is available at https://github.com/RUCAIBox/DS-MLP.

Problem

Research questions and friction points this paper is trying to address.

CTR prediction

feature interaction

dual-stream architecture

overfitting

module imbalance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-Stream MLP

Knowledge Distillation

Feature Interaction