🤖 AI Summary
Traditional machine translation in cross-lingual sponsored search often ignores query context, leading to semantic ambiguity and degraded CTR/CVR performance. To address this, we propose a context-aware query translation and ad-matching framework. Our method constructs a heterogeneous graph capturing user behavior and query-ad co-occurrence patterns, and leverages Graph Attention Networks (GATs) to enhance contextual semantic representation. We further align multilingual embedding spaces via a dual-encoder architecture based on mBERT and XLM-R, jointly optimized with contrastive learning. This approach effectively mitigates translation ambiguity by grounding translations in both linguistic and behavioral context. Experiments on English–Chinese, English–Spanish, and English–French datasets achieve BLEU scores of 38.9 and semantic similarity of 0.83; downstream CTR improves by 4.67% and CVR by 1.72%, demonstrating significant gains in both translation quality and advertising effectiveness.
📝 Abstract
Cross-lingual sponsored search is crucial for global advertising platforms, where users from different language backgrounds interact with multilingual ads. Traditional machine translation methods often fail to capture query-specific contextual cues, leading to semantic ambiguities that negatively impact click-through rates (CTR) and conversion rates (CVR). To address this challenge, we propose AdGraphTrans, a novel dual-encoder framework enhanced with graph neural networks (GNNs) for context-aware query translation in advertising. Specifically, user queries and ad contents are independently encoded using multilingual Transformer-based encoders (mBERT/XLM-R), and contextual relations-such as co-clicked ads, user search sessions, and query-ad co-occurrence-are modeled as a heterogeneous graph. A graph attention network (GAT) is then applied to refine embeddings by leveraging semantic and behavioral context. These embeddings are aligned via contrastive learning to reduce translation ambiguity. Experiments conducted on a cross-lingual sponsored search dataset collected from Google Ads and Amazon Ads (EN-ZH, EN-ES, EN-FR pairs) demonstrate that AdGraphTrans significantly improves query translation quality, achieving a BLEU score of 38.9 and semantic similarity (cosine score) of 0.83, outperforming strong baselines such as mBERT and M2M-100. Moreover, in downstream ad retrieval tasks, AdGraphTrans yields +4.67% CTR and +1.72% CVR improvements over baseline methods. These results confirm that incorporating graph-based contextual signals with dual-encoder translation provides a robust solution for enhancing cross-lingual sponsored search in advertising platforms.