Point-RTD: Replaced Token Denoising for Pretraining Transformer Models on Point Clouds

📅 2025-09-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the insufficient token robustness of 3D point cloud Transformer models during pre-training, this paper proposes Point-RTD, a self-supervised pre-training method based on Replaced Token Denoising (RTD). Unlike prevailing masked reconstruction paradigms, Point-RTD corrupts tokens via random replacement—not masking—and employs a generator-discriminator adversarial framework to enable structure-aware denoising reconstruction, thereby significantly enhancing the model’s capacity to learn local geometric and semantic priors. On ShapeNet, Point-RTD reduces Chamfer Distance for point cloud reconstruction by 14.3×, achieving a 93.2% drop in reconstruction error. In classification tasks, it consistently outperforms Point-MAE on ShapeNet, ModelNet10, and ModelNet40, while accelerating convergence by approximately 40%.

Technology Category

Application Category

📝 Abstract
Pre-training strategies play a critical role in advancing the performance of transformer-based models for 3D point cloud tasks. In this paper, we introduce Point-RTD (Replaced Token Denoising), a novel pretraining strategy designed to improve token robustness through a corruption-reconstruction framework. Unlike traditional mask-based reconstruction tasks that hide data segments for later prediction, Point-RTD corrupts point cloud tokens and leverages a discriminator-generator architecture for denoising. This shift enables more effective learning of structural priors and significantly enhances model performance and efficiency. On the ShapeNet dataset, Point-RTD reduces reconstruction error by over 93% compared to PointMAE, and achieves more than 14x lower Chamfer Distance on the test set. Our method also converges faster and yields higher classification accuracy on ShapeNet, ModelNet10, and ModelNet40 benchmarks, clearly outperforming the baseline Point-MAE framework in every case.
Problem

Research questions and friction points this paper is trying to address.

Develops pretraining strategy for transformer models on 3D point clouds
Improves token robustness through corruption-reconstruction denoising framework
Enhances model performance and efficiency for 3D point cloud tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Replaced Token Denoising pretraining strategy for point clouds
Corruption-reconstruction framework using discriminator-generator architecture
Improves token robustness by learning structural priors effectively