🤖 AI Summary
Existing CTR models neglect inference-time optimization, leading to unreliable predictions—particularly for low-frequency feature interactions. Method: We propose a model-agnostic test-time enhancement framework: (1) hierarchical probabilistic hashing estimates confidence scores for multi-order feature combinations; (2) confidence-guided dynamic construction of multi-path inference architectures, integrating confidence-weighted sampling and prediction aggregation—all without modifying the training procedure. Contribution/Results: This work pioneers test-time optimization for CTR prediction. It significantly improves offline metrics across mainstream models (e.g., DeepFM, DIN, DCN) and delivers statistically significant gains in large-scale online A/B tests. The approach is architecture-agnostic, requires no retraining, and demonstrates strong compatibility with production systems and practical deployability.
📝 Abstract
Recently, a growing body of research has focused on either optimizing CTR model architectures to better model feature interactions or refining training objectives to aid parameter learning, thereby achieving better predictive performance. However, previous efforts have primarily focused on the training phase, largely neglecting opportunities for optimization during the inference phase. Infrequently occurring feature combinations, in particular, can degrade prediction performance, leading to unreliable or low-confidence outputs. To unlock the predictive potential of trained CTR models, we propose a Model-Agnostic Test-Time paradigm (MATT), which leverages the confidence scores of feature combinations to guide the generation of multiple inference paths, thereby mitigating the influence of low-confidence features on the final prediction. Specifically, to quantify the confidence of feature combinations, we introduce a hierarchical probabilistic hashing method to estimate the occurrence frequencies of feature combinations at various orders, which serve as their corresponding confidence scores. Then, using the confidence scores as sampling probabilities, we generate multiple instance-specific inference paths through iterative sampling and subsequently aggregate the prediction scores from multiple paths to conduct robust predictions. Finally, extensive offline experiments and online A/B tests strongly validate the compatibility and effectiveness of MATT across existing CTR models.