🤖 AI Summary
Existing self-supervised contrastive learning methods for point clouds suffer from isolated multi-branch feature encoding, where branches operate independently until the loss layer, limiting inter-branch semantic interaction. Method: We propose PoCCA—a Point Cloud Cross-Branch Attention framework—that introduces cross-branch attention at early encoder stages to enable explicit feature interaction and fusion, departing from conventional unidirectional independent encoding. PoCCA jointly optimizes point cloud augmentation strategies, a dual-branch encoder, and a sub-branch attention module, requiring no additional labels or training data. Contribution/Results: On benchmarks including ModelNet40, PoCCA achieves state-of-the-art performance in downstream classification and segmentation tasks under self-supervised settings, demonstrating that early-stage cross-branch information exchange significantly enhances representation discriminability and generalizability.
📝 Abstract
Contrastive learning is an essential method in self-supervised learning. It primarily employs a multi-branch strategy to compare latent representations obtained from different branches and train the encoder. In the case of multi-modal input, diverse modalities of the same object are fed into distinct branches. When using single-modal data, the same input undergoes various augmentations before being fed into different branches. However, all existing contrastive learning frameworks have so far only performed contrastive operations on the learned features at the final loss end, with no information exchange between different branches prior to this stage. In this paper, for point cloud unsupervised learning without the use of extra training data, we propose a Contrastive Cross-branch Attention-based framework for Point cloud data (termed PoCCA), to learn rich $3 D$ point cloud representations. By introducing sub-branches, PoCCA allows information exchange between different branches before the loss end. Experimental results demonstrate that in the case of using no extra training data, the representations learned with our self-supervised model achieve state-of-the-art performances when used for downstream tasks on point clouds.