🤖 AI Summary
In data clean room settings, CVR prediction faces dual constraints: stringent user privacy protection and the requirement that advertisers’ data remain within their own domain. To address this, we propose the first collaborative training framework integrating batch-level gradient aggregation, Adapter-based efficient fine-tuning, and label differential privacy with bias mitigation. Without sharing raw labels or model parameters, our method enables cross-domain joint modeling via gradient-level collaboration: batch-wise gradient aggregation ensures regulatory compliance; lightweight Adapters enable low-overhead domain adaptation; and bias-corrected label differential privacy mitigates estimation bias induced by noise injection. Evaluated on industrial datasets, our approach achieves state-of-the-art ROC-AUC performance while reducing communication overhead by 62%. It strictly adheres to GDPR and other privacy regulations, fulfilling practical commercial deployment requirements.
📝 Abstract
In the realm of online advertising, accurately predicting the conversion rate (CVR) is crucial for enhancing advertising efficiency and user satisfaction. This paper addresses the challenge of CVR prediction while adhering to user privacy preferences and advertiser requirements. Traditional methods face obstacles such as the reluctance of advertisers to share sensitive conversion data and the limitations of model training in secure environments like data clean rooms. We propose a novel model training framework that enables collaborative model training without sharing sample-level gradients with the advertising platform. Our approach introduces several innovative components: (1) utilizing batch-level aggregated gradients instead of sample-level gradients to minimize privacy risks; (2) applying adapter-based parameter-efficient fine-tuning and gradient compression to reduce communication costs; and (3) employing de-biasing techniques to train the model under label differential privacy, thereby maintaining accuracy despite privacy-enhanced label perturbations. Our experimental results, conducted on industrial datasets, demonstrate that our method achieves competitive ROC-AUC performance while significantly decreasing communication overhead and complying with both advertisers’ privacy requirements and user privacy choices. This framework establishes a new standard for privacy-preserving, high-performance CVR prediction in the digital advertising landscape.