Beyond BEV: Optimizing Point-Level Tokens for Collaborative Perception

📅 2025-08-27
📈 Citations: 0
Influential: 0
📄 PDF

career value

199K/year
🤖 AI Summary
Existing collaborative perception methods predominantly adopt 2D bird’s-eye view (BEV) representations for intermediate features, leading to substantial loss of critical 3D geometric and semantic structure information—thereby limiting fine-grained object detection and localization accuracy. To address this, we propose CoPLOT: the first framework to employ point-level tokens as the intermediate representation for collaborative perception, eliminating BEV projection entirely to preserve native 3D geometry and semantics. CoPLOT introduces three key innovations: (i) semantic-aware token reordering, (ii) frequency-domain enhanced state-space modeling, and (iii) a closed-loop-driven neighbor–ego spatial alignment mechanism. Built upon a point-native processing pipeline, CoPLOT achieves significant improvements over state-of-the-art methods on both synthetic and real-world benchmarks, notably enhancing detection and localization accuracy while simultaneously reducing communication bandwidth and computational overhead.

Technology Category

Application Category

📝 Abstract
Collaborative perception allows agents to enhance their perceptual capabilities by exchanging intermediate features. Existing methods typically organize these intermediate features as 2D bird's-eye-view (BEV) representations, which discard critical fine-grained 3D structural cues essential for accurate object recognition and localization. To this end, we first introduce point-level tokens as intermediate representations for collaborative perception. However, point-cloud data are inherently unordered, massive, and position-sensitive, making it challenging to produce compact and aligned point-level token sequences that preserve detailed structural information. Therefore, we present CoPLOT, a novel Collaborative perception framework that utilizes Point-Level Optimized Tokens. It incorporates a point-native processing pipeline, including token reordering, sequence modeling, and multi-agent spatial alignment. A semantic-aware token reordering module generates adaptive 1D reorderings by leveraging scene-level and token-level semantic information. A frequency-enhanced state space model captures long-range sequence dependencies across both spatial and spectral domains, improving the differentiation between foreground tokens and background clutter. Lastly, a neighbor-to-ego alignment module applies a closed-loop process, combining global agent-level correction with local token-level refinement to mitigate localization noise. Extensive experiments on both simulated and real-world datasets show that CoPLOT outperforms state-of-the-art models, with even lower communication and computation overhead. Code will be available at https://github.com/CheeryLeeyy/CoPLOT.
Problem

Research questions and friction points this paper is trying to address.

Optimizing point-level tokens for collaborative perception
Addressing unordered and massive point-cloud data challenges
Enhancing object recognition with fine-grained 3D structural cues
Innovation

Methods, ideas, or system contributions that make the work stand out.

Point-level tokens preserve 3D structure
Semantic-aware reordering optimizes token sequences
Frequency-enhanced model captures spatial-spectral dependencies
Y
Yang Li
State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China
Q
Quan Yuan
State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China
G
Guiyang Luo
State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China
X
Xiaoyuan Fu
State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China
R
Rui Pan
State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China
Y
Yujia Yang
State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China
C
Congzhang Shao
State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China
Yuewen Liu
Yuewen Liu
Professor, Xi'an Jiaotong University
Social NetworkE-commerceBig Data Analysis
J
Jinglin Li
State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China