🤖 AI Summary
To address the trade-off among efficiency, robustness, and realism in behavior modeling for large-scale multi-agent driving simulation, this paper proposes an instance-centric lightweight interaction modeling framework. Methodologically: (1) an instance-centric local coordinate system enables cross-timestep reuse of static scene elements; (2) a query-centric symmetric context encoder, augmented with relative positional encoding, efficiently models dynamic agent interactions; (3) adaptive reward transformation coupled with adversarial inverse reinforcement learning automatically balances training stability and trajectory fidelity. Experiments demonstrate that the proposed method significantly reduces both training and inference overhead, achieves superior scalability compared to mainstream baselines, and attains state-of-the-art performance in positional accuracy and perturbation robustness.
📝 Abstract
Scalable multi-agent driving simulation requires behavior models that are both realistic and computationally efficient. We address this by optimizing the behavior model that controls individual traffic participants. To improve efficiency, we adopt an instance-centric scene representation, where each traffic participant and map element is modeled in its own local coordinate frame. This design enables efficient, viewpoint-invariant scene encoding and allows static map tokens to be reused across simulation steps. To model interactions, we employ a query-centric symmetric context encoder with relative positional encodings between local frames. We use Adversarial Inverse Reinforcement Learning to learn the behavior model and propose an adaptive reward transformation that automatically balances robustness and realism during training. Experiments demonstrate that our approach scales efficiently with the number of tokens, significantly reducing training and inference times, while outperforming several agent-centric baselines in terms of positional accuracy and robustness.