🤖 AI Summary
To address weak generalization caused by uniform sample training and limited representation capacity due to shared single supervision across multiple encoders in CTR prediction, this paper proposes TF4CTR—a dual-focus framework. Methodologically, it introduces (1) a Sample Selection Embedding Module (SSEM) that enables difficulty-aware sample selection and dynamic encoder assignment; (2) a Dual-Focus Loss (TF Loss) providing hierarchical, sample-level differentiated supervision; and (3) a Dynamic Fusion Module (DFM) enhancing multi-granularity feature interaction modeling. The framework is plug-and-play compatible with existing architectures. Extensive experiments on five real-world datasets demonstrate consistent and significant performance gains over mainstream models—including Wide&Deep, DeepFM, and AutoInt—validating its strong compatibility and superior generalization capability. Code and experimental logs are publicly available.
📝 Abstract
Effective feature interaction modeling is critical for enhancing the accuracy of click-through rate (CTR) prediction in industrial recommender systems. Most of the current deep CTR models resort to building complex network architectures to better capture intricate feature interactions or user behaviors. However, we identify two limitations in these models: (1) the samples given to the model are undifferentiated, which may lead the model to learn a larger number of easy samples in a single-minded manner while ignoring a smaller number of hard samples, thus reducing the model's generalization ability; (2) differentiated feature interaction encoders are designed to capture different interactions information but receive consistent supervision signals, thereby limiting the effectiveness of the encoder. To bridge the identified gaps, this paper introduces a novel CTR prediction framework by integrating the plug-and-play Twin Focus (TF) Loss, Sample Selection Embedding Module (SSEM), and Dynamic Fusion Module (DFM), named the Twin Focus Framework for CTR (TF4CTR). Specifically, the framework employs the SSEM at the bottom of the model to differentiate between samples, thereby assigning a more suitable encoder for each sample. Meanwhile, the TF Loss provides tailored supervision signals to both simple and complex encoders. Moreover, the DFM dynamically fuses the feature interaction information captured by the encoders, resulting in more accurate predictions. Experiments on five real-world datasets confirm the effectiveness and compatibility of the framework, demonstrating its capacity to enhance various representative baselines in a model-agnostic manner. To facilitate reproducible research, our open-sourced code and detailed running logs will be made available at: https://github.com/salmon1802/TF4CTR.