🤖 AI Summary
To address the challenge of balancing real-time performance with geometric and semantic fidelity in multi-robot collaborative 3D scene reconstruction, this paper proposes a task-oriented communication framework—Context-Aware Bandit-PPO—integrating AoI (Age of Information) and semantic importance modeling. It introduces an ω-threshold selection policy and an ω-wait scheduling strategy to enable intelligent, dynamic sampling and transmission optimization of image streams under time-varying conditions. The method synergizes reinforcement learning, AoI-driven timeliness modeling, lightweight semantic assessment, and edge-coordinated TSDF reconstruction. Evaluated on ScanNet and 3RScan, it achieves, under end-to-end latency ≤120 ms, an 18.7% improvement in TSDF reconstruction accuracy and a 22.3% increase in point cloud completeness over baselines, while ensuring interpretable policy decisions. Its core contribution is the first integration of joint AoI–semantic optimization into a multi-agent 3D perception closed loop.
📝 Abstract
Real-time Three-dimensional (3D) scene representation is a foundational element that supports a broad spectrum of cutting-edge applications, including digital manufacturing, Virtual, Augmented, and Mixed Reality (VR/AR/MR), and the emerging metaverse. Despite advancements in real-time communication and computing, achieving a balance between timeliness and fidelity in 3D scene representation remains a challenge. This work investigates a wireless network where multiple homogeneous mobile robots, equipped with cameras, capture an environment and transmit images to an edge server over channels for 3D representation. We propose a contextual-bandit Proximal Policy Optimization (PPO) framework incorporating both Age of Information (AoI) and semantic information to optimize image selection for representation, balancing data freshness and representation quality. Two policies -- the $ω$-threshold and $ω$-wait policies -- together with two benchmark methods are evaluated, timeliness embedding and weighted sum, on standard datasets and baseline 3D scene representation models. Experimental results demonstrate improved representation fidelity while maintaining low latency, offering insight into the model's decision-making process. This work advances real-time 3D scene representation by optimizing the trade-off between timeliness and fidelity in dynamic environments.