Beyond Random Masking: A Dual-Stream Approach for Rotation-Invariant Point Cloud Masked Autoencoders

📅 2025-09-18

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Existing rotation-invariant point cloud masked autoencoders employ random masking, neglecting geometric structure and semantic coherence, thereby failing to model spatial relationships consistent across orientations or rotation-robust semantic parts. To address this, we propose a dual-stream masking strategy: (1) a 3D spatial grid masking based on coordinate sorting, explicitly preserving local geometric structure; and (2) an attention-driven clustering-based semantic masking that focuses on identity-stable semantic regions under arbitrary rotations. These two masking schemes are dynamically weighted via curriculum learning, enabling progressive, geometry-to-semantics cooperative training. Our method is plug-and-play—requiring no backbone modification. Extensive experiments on ModelNet40, ScanObjectNN, and OmniObject3D demonstrate significant improvements over baselines, achieving state-of-the-art performance under diverse rotation settings.

Technology Category

Application Category

📝 Abstract

Existing rotation-invariant point cloud masked autoencoders (MAE) rely on random masking strategies that overlook geometric structure and semantic coherence. Random masking treats patches independently, failing to capture spatial relationships consistent across orientations and overlooking semantic object parts that maintain identity regardless of rotation. We propose a dual-stream masking approach combining 3D Spatial Grid Masking and Progressive Semantic Masking to address these fundamental limitations. Grid masking creates structured patterns through coordinate sorting to capture geometric relationships that persist across different orientations, while semantic masking uses attention-driven clustering to discover semantically meaningful parts and maintain their coherence during masking. These complementary streams are orchestrated via curriculum learning with dynamic weighting, progressing from geometric understanding to semantic discovery. Designed as plug-and-play components, our strategies integrate into existing rotation-invariant frameworks without architectural changes, ensuring broad compatibility across different approaches. Comprehensive experiments on ModelNet40, ScanObjectNN, and OmniObject3D demonstrate consistent improvements across various rotation scenarios, showing substantial performance gains over the baseline rotation-invariant methods.

Problem

Research questions and friction points this paper is trying to address.

Overcomes random masking's neglect of geometric structure and semantic coherence

Proposes dual-stream approach for rotation-invariant point cloud representation learning

Ensures compatibility with existing frameworks without architectural modifications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-stream masking with grid and semantic strategies

Curriculum learning with dynamic weighting progression

Plug-and-play components for existing frameworks

🔎 Similar Papers

RI-MAE: Rotation-Invariant Masked AutoEncoders for Self-Supervised Point Cloud Representation Learning