🤖 AI Summary
This work proposes SLNet, an ultra-lightweight backbone network for 3D point cloud recognition that addresses the high computational cost and large parameter count of existing models. Its core innovations include a non-parametric adaptive point embedding (NAPE) and a geometric modulation unit (GMU), which—combined with FPS+kNN grouping, non-parametric normalization, and shared residual MLPs—enable highly efficient geometry-aware feature extraction at minimal parameter cost. For scene segmentation, SLNet further integrates local Point Transformer attention. Experiments demonstrate that SLNet-S achieves 93.64% accuracy on ModelNet40 with only 0.14M parameters, outperforming PointMLP-elite despite using five times fewer parameters; SLNet-M attains 84.25% accuracy on ScanObjectNN with 1/28 the parameters; and SLNet-T reaches 58.2% mIoU on S3DIS with just 2.5M parameters, a 17-fold reduction. The study also introduces NetScore+, a metric jointly evaluating latency and memory efficiency.
📝 Abstract
We present SLNet, a lightweight backbone for 3D point cloud recognition designed to achieve strong performance without the computational cost of many recent attention, graph, and deep MLP based models. The model is built on two simple ideas: NAPE (Nonparametric Adaptive Point Embedding), which captures spatial structure using a combination of Gaussian RBF and cosine bases with input adaptive bandwidth and blending, and GMU (Geometric Modulation Unit), a per channel affine modulator that adds only 2D learnable parameters. These components are used within a four stage hierarchical encoder with FPS+kNN grouping, nonparametric normalization, and shared residual MLPs. In experiments, SLNet shows that a very small model can still remain highly competitive across several 3D recognition tasks. On ModelNet40, SLNet-S with 0.14M parameters and 0.31 GFLOPs achieves 93.64% overall accuracy, outperforming PointMLP-elite with 5x fewer parameters, while SLNet-M with 0.55M parameters and 1.22 GFLOPs reaches 93.92%, exceeding PointMLP with 24x fewer parameters. On ScanObjectNN, SLNet-M achieves 84.25% overall accuracy within 1.2 percentage points of PointMLP while using 28x fewer parameters. For large scale scene segmentation, SLNet-T extends the backbone with local Point Transformer attention and reaches 58.2% mIoU on S3DIS Area 5 with only 2.5M parameters, more than 17x fewer than Point Transformer V3. We also introduce NetScore+, which extends NetScore by incorporating latency and peak memory so that efficiency can be evaluated in a more deployment oriented way. Across multiple benchmarks and hardware settings, SLNet delivers a strong overall balance between accuracy and efficiency. Code is available at: https://github.com/m-saeid/SLNet.