🤖 AI Summary
This work addresses the challenges in post-disaster 3D point cloud semantic segmentation, where fixed point ordering limits representational capacity and exhaustive neighborhood search incurs high computational costs, hindering accurate identification of critical infrastructure such as damaged buildings and roads. To overcome these limitations, the authors propose OPTNet, a novel architecture built upon Point Transformer that introduces a learnable Point Sorter module. This module dynamically optimizes point ordering through a self-supervised sorting loss, thereby enhancing local attention modeling. Furthermore, OPTNet incorporates a windowed attention mechanism to circumvent the computational bottlenecks associated with k-nearest neighbors (k-NN) and farthest point sampling (FPS). Evaluated on the 3DAeroRelief dataset, OPTNet significantly outperforms state-of-the-art methods, achieving concurrent improvements in both accuracy and efficiency.
📝 Abstract
Post-disaster damage assessment requires rapid and accurate semantic segmentation of 3D point clouds to identify critical infrastructure such as damaged buildings and roads. Early Point Transformers (e.g., PTv1, PTv2) relied on computationally expensive neighbor searching (k-NN) and Farthest Point Sampling (FPS). To improve efficiency, recent architectures like Point Transformer V3 (PTv3) adopted static serialization methods, such as Hilbert curves or Z-order, to organize unstructured points for window-based attention. However, these fixed orderings are not optimal for capturing the complex geometry of disaster scenes. In this paper, we propose OPTNet (Ordering Point Transformer Network), which introduces a learnable Point Sorter module. OPTNet utilizes a self-supervised ordering loss to dynamically predict an optimal permutation that maximizes the locality of the attention mechanism. We evaluate our method on the 3DAeroRelief dataset, significantly outperforming state-of-the-art baselines.