🤖 AI Summary
Existing RWKV models struggle to effectively capture the local geometric structures and spatial dependencies inherent in point clouds. To address this limitation, this work proposes the P-RWKV module, which adapts RWKV’s efficient sequential modeling capability to irregular 3D point clouds through a Local Perception Expansion (LPE) mechanism and a Spatial Context Enhancement (SCE) strategy. Built upon this module, the authors introduce PointER, a self-supervised representation learning framework. As the first successful adaptation of RWKV to 3D point cloud analysis, the proposed approach achieves competitive performance across multiple point cloud tasks while maintaining linear computational complexity, strong spatial awareness, and plug-and-play compatibility across diverse architectures—all with reduced computational overhead and inference latency.
📝 Abstract
The recent receptance weighted key value (RWKV) model combines RNN-style recurrence, offering a linear-complexity alternative to Transformers' quadratic self-attention for modeling global dependencies. However, when directly applied to point clouds, RWKV, originally developed for sequential text, struggles to capture local geometric structures and model spatial dependencies effectively. To address this, we propose the \textbf{P-RWKV} block, which bridges the gap between sequence modeling and irregular 3D geometry while preserving the efficiency advantages of RWKV. It consists of a Local Perception Expansion (LPE) component to expand contextual perception along the spatio-temporal sequence and a Spatial Context Enhancement (SCE) component to strengthen spatial awareness. To validate the effectiveness of P-RWKV for point cloud understanding, we construct PointER, a single-modality self-supervised representation learning framework whose encoder is composed of stacked P-RWKV blocks. Furthermore, we extend P-RWKV to a cross-modality setting and integrate the proposed core sub-modules into multiple architectures, demonstrating strong plug-and-play flexibility and architectural generality. Extensive experiments show that the P-RWKV block and its key sub-modules achieve competitive performance across various tasks with lower computational cost and inference latency. Code will be released upon acceptance.