🤖 AI Summary
To address the insufficient robustness of 3D vision-language foundation models (VLFMs) against noisy, incomplete, or distribution-shifted point clouds, this paper proposes a training-free online test-time adaptation method. Our approach enables the first training-free test-time optimization for 3D VLFMs, comprising: (i) dynamic prototype learning to continuously refine class-wise centroid representations; (ii) graph-structure-guided label smoothing to enhance prediction consistency; and (iii) integration of a 3D cache mechanism, similarity-driven logit recalibration, and entropy-weighted prediction fusion. Evaluated on three major robustness benchmarks—ModelNet-40C, ScanObjectNN-C, and ShapeNet-C—our method improves classification accuracy by 10.55%, 8.26%, and 4.49%, respectively. These gains demonstrate substantial enhancement in generalization to real-world degraded point cloud data, without requiring any parameter updates or retraining.
📝 Abstract
3D Vision-Language Foundation Models (VLFMs) have shown strong generalization and zero-shot recognition capabilities in open-world point cloud processing tasks. However, these models often underperform in practical scenarios where data are noisy, incomplete, or drawn from a different distribution than the training data. To address this, we propose Uni-Adapter, a novel training-free online test-time adaptation (TTA) strategy for 3D VLFMs based on dynamic prototype learning. We define a 3D cache to store class-specific cluster centers as prototypes, which are continuously updated to capture intra-class variability in heterogeneous data distributions. These dynamic prototypes serve as anchors for cache-based logit computation via similarity scoring. Simultaneously, a graph-based label smoothing module captures inter-prototype similarities to enforce label consistency among similar prototypes. Finally, we unify predictions from the original 3D VLFM and the refined 3D cache using entropy-weighted aggregation for reliable adaptation. Without retraining, Uni-Adapter effectively mitigates distribution shifts, achieving state-of-the-art performance on diverse 3D benchmarks over different 3D VLFMs, improving ModelNet-40C by 10.55%, ScanObjectNN-C by 8.26%, and ShapeNet-C by 4.49% over the source 3D VLFMs.