Vision-Only Gaussian Splatting for Collaborative Semantic Occupancy Prediction

📅 2025-08-12

📈 Citations: 0

✨ Influential: 0

career value

253K/year

🤖 AI Summary

To address the high communication overhead and strong dependency on accurate depth estimation in collaborative perception for visual 3D semantic occupancy prediction, this paper proposes a novel method based on sparse 3D semantic Gaussian lattices. It is the first to introduce Gaussian lattices into this task, replacing dense voxel grids or depth maps with a lightweight, geometry-aware explicit representation. We design a cross-agent neighborhood fusion mechanism to achieve deduplication, noise suppression, and structural preservation; further, we propose geometric-semantic joint encoding and object-centered sparse message passing to significantly reduce reliance on precise depth estimation. The method supports rigid alignment and low-bandwidth transmission. On nuScenes, it achieves +8.42 and +3.28 mIoU gains over single-agent and strong baseline methods, respectively, and +5.11 and +22.41 IoU improvements, while requiring only 34.6% of the communication cost to retain a 1.9-point mIoU gain.

Technology Category

Application Category

📝 Abstract

Collaborative perception enables connected vehicles to share information, overcoming occlusions and extending the limited sensing range inherent in single-agent (non-collaborative) systems. Existing vision-only methods for 3D semantic occupancy prediction commonly rely on dense 3D voxels, which incur high communication costs, or 2D planar features, which require accurate depth estimation or additional supervision, limiting their applicability to collaborative scenarios. To address these challenges, we propose the first approach leveraging sparse 3D semantic Gaussian splatting for collaborative 3D semantic occupancy prediction. By sharing and fusing intermediate Gaussian primitives, our method provides three benefits: a neighborhood-based cross-agent fusion that removes duplicates and suppresses noisy or inconsistent Gaussians; a joint encoding of geometry and semantics in each primitive, which reduces reliance on depth supervision and allows simple rigid alignment; and sparse, object-centric messages that preserve structural information while reducing communication volume. Extensive experiments demonstrate that our approach outperforms single-agent perception and baseline collaborative methods by +8.42 and +3.28 points in mIoU, and +5.11 and +22.41 points in IoU, respectively. When further reducing the number of transmitted Gaussians, our method still achieves a +1.9 improvement in mIoU, using only 34.6% communication volume, highlighting robust performance under limited communication budgets.

Problem

Research questions and friction points this paper is trying to address.

Overcoming occlusions and extending sensing range in collaborative perception

Reducing high communication costs in vision-only 3D semantic occupancy prediction

Minimizing reliance on depth supervision for accurate collaborative perception

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse 3D semantic Gaussian splatting for collaboration

Neighborhood-based cross-agent fusion for noise reduction

Joint geometry-semantics encoding in Gaussian primitives

🔎 Similar Papers

No similar papers found.