HALO: Hindsight-Augmented Learning for Online Auto-Bidding

📅 2025-08-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multi-constraint bidding (MCB) in digital advertising faces two key challenges: low sample efficiency and poor generalization across unseen budget–ROI configurations. To address these, we propose an end-to-end differentiable online learning framework. First, we design a theoretically grounded retrospective mechanism that repurposes historical exploration trajectories as training signals for arbitrary constraint combinations, significantly improving sample reuse. Second, we model the bidding policy using B-spline functions, enabling a continuous, gradient-aware mapping over the constraint space and explicitly encoding the physical relationship between constraints and bid coefficients. Evaluated on large-scale industrial datasets, our method substantially reduces constraint violation rates and increases gross merchandise value (GMV). Crucially, it demonstrates strong zero-shot adaptability to previously unseen budget and ROI configurations—without retraining—thereby enhancing operational robustness and scalability in real-world auction systems.

Technology Category

Application Category

📝 Abstract
Digital advertising platforms operate millisecond-level auctions through Real-Time Bidding (RTB) systems, where advertisers compete for ad impressions through algorithmic bids. This dynamic mechanism enables precise audience targeting but introduces profound operational complexity due to advertiser heterogeneity: budgets and ROI targets span orders of magnitude across advertisers, from individual merchants to multinational brands. This diversity creates a demanding adaptation landscape for Multi-Constraint Bidding (MCB). Traditional auto-bidding solutions fail in this environment due to two critical flaws: 1) severe sample inefficiency, where failed explorations under specific constraints yield no transferable knowledge for new budget-ROI combinations, and 2) limited generalization under constraint shifts, as they ignore physical relationships between constraints and bidding coefficients. To address this, we propose HALO: Hindsight-Augmented Learning for Online Auto-Bidding. HALO introduces a theoretically grounded hindsight mechanism that repurposes all explorations into training data for arbitrary constraint configuration via trajectory reorientation. Further, it employs B-spline functional representation, enabling continuous, derivative-aware bid mapping across constraint spaces. HALO ensures robust adaptation even when budget/ROI requirements differ drastically from training scenarios. Industrial dataset evaluations demonstrate the superiority of HALO in handling multi-scale constraints, reducing constraint violations while improving GMV.
Problem

Research questions and friction points this paper is trying to address.

Improves auto-bidding adaptation for diverse advertiser budgets and ROI targets
Addresses sample inefficiency and poor generalization in traditional auto-bidding
Enables robust bidding under drastic budget/ROI shifts via hindsight learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hindsight-augmented learning repurposes explorations
B-spline enables continuous bid mapping
Trajectory reorientation for arbitrary constraints
🔎 Similar Papers
No similar papers found.
P
Pusen Dong
Shopee Company, China
C
Chenglong Cao
Shopee Company, China
X
Xinyu Zhou
Shopee Company, China
J
Jirong You
Shopee Company, China
L
Linhe Xu
Shopee Company, China
F
Feifan Xu
Shopee Company, China
Shuo Yuan
Shuo Yuan
Beijing University of Posts & Telecommunications
Satellite communicationEdge intelligenceIntegrated satellite-terrestrial networks