Efficient RWKV-based Representation Learning for 3D Point Clouds

📅 2026-06-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing RWKV models struggle to effectively capture the local geometric structures and spatial dependencies inherent in point clouds. To address this limitation, this work proposes the P-RWKV module, which adapts RWKV’s efficient sequential modeling capability to irregular 3D point clouds through a Local Perception Expansion (LPE) mechanism and a Spatial Context Enhancement (SCE) strategy. Built upon this module, the authors introduce PointER, a self-supervised representation learning framework. As the first successful adaptation of RWKV to 3D point cloud analysis, the proposed approach achieves competitive performance across multiple point cloud tasks while maintaining linear computational complexity, strong spatial awareness, and plug-and-play compatibility across diverse architectures—all with reduced computational overhead and inference latency.
📝 Abstract
The recent receptance weighted key value (RWKV) model combines RNN-style recurrence, offering a linear-complexity alternative to Transformers' quadratic self-attention for modeling global dependencies. However, when directly applied to point clouds, RWKV, originally developed for sequential text, struggles to capture local geometric structures and model spatial dependencies effectively. To address this, we propose the \textbf{P-RWKV} block, which bridges the gap between sequence modeling and irregular 3D geometry while preserving the efficiency advantages of RWKV. It consists of a Local Perception Expansion (LPE) component to expand contextual perception along the spatio-temporal sequence and a Spatial Context Enhancement (SCE) component to strengthen spatial awareness. To validate the effectiveness of P-RWKV for point cloud understanding, we construct PointER, a single-modality self-supervised representation learning framework whose encoder is composed of stacked P-RWKV blocks. Furthermore, we extend P-RWKV to a cross-modality setting and integrate the proposed core sub-modules into multiple architectures, demonstrating strong plug-and-play flexibility and architectural generality. Extensive experiments show that the P-RWKV block and its key sub-modules achieve competitive performance across various tasks with lower computational cost and inference latency. Code will be released upon acceptance.
Problem

Research questions and friction points this paper is trying to address.

RWKV
3D point clouds
local geometric structures
spatial dependencies
representation learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

RWKV
3D point clouds
representation learning
spatial context enhancement
self-supervised learning
Y
Yun Liu
School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China; and also with the Shenzhen Institute of Research, Nanjing University of Aeronautics and Astronautics, Shenzhen, China
Xuefeng Yan
Xuefeng Yan
Molecular Imaging Branch/National Institute of Mental Health/National Institutes of Health
Molecular imaging
Liangliang Nan
Liangliang Nan
Delft University of Technology
Computer GraphicsComputer VisionMachine Learning3D Geoinformation
Xianzhi Li
Xianzhi Li
Huazhong University of Science and Technology
3D visiongeometry processing
P
Peng Li
School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China; and also with the Shenzhen Institute of Research, Nanjing University of Aeronautics and Astronautics, Shenzhen, China
Zhe Zhu
Zhe Zhu
Nanjing University of Aeronautics and Astronautics
3D Vision
Honghua Chen
Honghua Chen
Research Assistant Professor, Lingnan University, Hong Kong
3D Measurement/Vision3D GenerationDeep Geometry Learning
Mingqiang Wei
Mingqiang Wei
Professor at Nanjing University of Aeronautics and Astronautics
3D VisionMultimodal FusionComputer GraphicsDeep Geometry LearningCAD