🤖 AI Summary
This work addresses the insufficient modeling of global scene context in 3D point cloud semantic segmentation by introducing a shared Wavelet Neural Operator (WNO) branch alongside the skip connections of Point Transformer. The WNO branch projects the point cloud onto a dense 3D voxel grid and explicitly captures multi-scale global context in the spectral domain through learnable wavelet decomposition and reconstruction. The extracted global features are then fused back into the main backbone via a lightweight adapter, effectively balancing global perception with fine-grained local details. As the first effort to integrate WNO with point cloud Transformers, the proposed method achieves state-of-the-art or competitive performance, attaining mIoU scores of 71.59%, 81.05%, and 76.19% on S3DIS (Area 5), DALES, and ScanNet v2, respectively.
📝 Abstract
Point cloud semantic segmentation requires architectures that capture both fine-grained local geometry and broad global scene structure. Transformer-based networks have demonstrated strong performance by focusing on detailed local feature aggregation; however, global context is conveyed primarily through skip connections across encoder-decoder stages, which we argue is insufficient for full scene understanding. We hypothesize that augmenting skip connections with a learnable global feature extraction module allows the network to acquire scene-level knowledge before descending into local detail, leading to richer and more contextually grounded representations. To this end, we propose Point Transformer with Wavelet Neural Operato (PT-WNO), which integrates a shared Wavelet Neural Operator (WNO) branch alongside the skip connections of a point cloud transformer backbone. At each encoder-decoder transition, point features are projected onto a dense 3D volumetric grid where the WNO captures multi-scale global spectral context through learnable wavelet decomposition and reconstruction. These global features are fused back into the network via lightweight adapters, complementing rather than replacing the existing skip connections. Experiments on four large-scale 3D point cloud benchmarks demonstrate the effectiveness of PT-WNO. On S3DIS (Area 5), PT-WNO achieves 71.59% mIoU, outperforming the Point Transformer v3 (PTv3) baseline by +1.03 points. On DALES it achieves 81.05% mIoU (+1.47 over the baseline). On ScanNet~v2, PT-WNO obtains 76.19% mIoU, remaining competitive with the baseline (76.36%).