🤖 AI Summary
This paper addresses the dual challenges of extreme class imbalance (positive sample rate < 0.5%) and user behavior heterogeneity in vehicle purchase prediction. To tackle these, we propose a novel multi-stage modeling framework: (1) constructing diverse base models to capture behavioral heterogeneity; (2) designing innovative relative-performance meta-features—such as prediction bias and peer-ranking scores—to guide meta-learning-based ensemble fusion; and (3) applying business-objective-oriented supervised knowledge distillation to compress the ensemble into a lightweight, deployable single model. The approach synergistically integrates meta-learning, ensemble learning, knowledge distillation, and relative feature engineering. Evaluated on a real-world dataset of 800,000 car owners, our method achieves 10% precision in top-60,000 recommendations, covering 50% of actual buyers—substantially outperforming state-of-the-art baselines. The distilled model retains over 98% of the ensemble’s predictive performance while significantly improving inference efficiency and operational scalability.
📝 Abstract
Predicting future vehicle purchases among existing owners presents a critical challenge due to extreme class imbalance (<0.5% positive rate) and complex behavioral patterns. We propose REMEDI (Relative feature Enhanced Meta-learning with Distillation for Imbalanced prediction), a novel multi-stage framework addressing these challenges. REMEDI first trains diverse base models to capture complementary aspects of user behavior. Second, inspired by comparative op-timization techniques, we introduce relative performance meta-features (deviation from ensemble mean, rank among peers) for effective model fusion through a hybrid-expert architecture. Third, we distill the ensemble's knowledge into a single efficient model via supervised fine-tuning with MSE loss, enabling practical deployment. Evaluated on approximately 800,000 vehicle owners, REMEDI significantly outperforms baseline approaches, achieving the business target of identifying ~50% of actual buyers within the top 60,000 recommendations at ~10% precision. The distilled model preserves the ensemble's predictive power while maintaining deployment efficiency, demonstrating REMEDI's effectiveness for imbalanced prediction in industry settings.