REMEDI: Relative Feature Enhanced Meta-Learning with Distillation for Imbalanced Prediction

📅 2025-05-12

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This paper addresses the dual challenges of extreme class imbalance (positive sample rate < 0.5%) and user behavior heterogeneity in vehicle purchase prediction. To tackle these, we propose a novel multi-stage modeling framework: (1) constructing diverse base models to capture behavioral heterogeneity; (2) designing innovative relative-performance meta-features—such as prediction bias and peer-ranking scores—to guide meta-learning-based ensemble fusion; and (3) applying business-objective-oriented supervised knowledge distillation to compress the ensemble into a lightweight, deployable single model. The approach synergistically integrates meta-learning, ensemble learning, knowledge distillation, and relative feature engineering. Evaluated on a real-world dataset of 800,000 car owners, our method achieves 10% precision in top-60,000 recommendations, covering 50% of actual buyers—substantially outperforming state-of-the-art baselines. The distilled model retains over 98% of the ensemble’s predictive performance while significantly improving inference efficiency and operational scalability.

Technology Category

Application Category

📝 Abstract

Predicting future vehicle purchases among existing owners presents a critical challenge due to extreme class imbalance (<0.5% positive rate) and complex behavioral patterns. We propose REMEDI (Relative feature Enhanced Meta-learning with Distillation for Imbalanced prediction), a novel multi-stage framework addressing these challenges. REMEDI first trains diverse base models to capture complementary aspects of user behavior. Second, inspired by comparative op-timization techniques, we introduce relative performance meta-features (deviation from ensemble mean, rank among peers) for effective model fusion through a hybrid-expert architecture. Third, we distill the ensemble's knowledge into a single efficient model via supervised fine-tuning with MSE loss, enabling practical deployment. Evaluated on approximately 800,000 vehicle owners, REMEDI significantly outperforms baseline approaches, achieving the business target of identifying ~50% of actual buyers within the top 60,000 recommendations at ~10% precision. The distilled model preserves the ensemble's predictive power while maintaining deployment efficiency, demonstrating REMEDI's effectiveness for imbalanced prediction in industry settings.

Problem

Research questions and friction points this paper is trying to address.

Addresses extreme class imbalance in vehicle purchase prediction

Enhances model fusion using relative performance meta-features

Distills ensemble knowledge into a single deployable model

Innovation

Methods, ideas, or system contributions that make the work stand out.

Diverse base models capture complementary user behavior

Relative performance meta-features enhance model fusion

Knowledge distillation into single efficient deployment model

🔎 Similar Papers

From Prejudice to Parity: A New Approach to Debiasing Large Language Model Word Embeddings