🤖 AI Summary
To address reconstruction distortion and edge discontinuities in textureless regions for multi-view stereo (MVS), this paper proposes an edge-constrained patch deformation method guided by multi-granularity segmentation priors. The method tackles two key challenges: (1) semantic-driven multi-granularity depth-edge priors enforce precise edge constraints within depth-continuous domains, enabling robust deformation of textureless regions; and (2) an anchor-adaptive balanced clustering mechanism coupled with disparity-aware 3D cost optimization mitigates attention imbalance and local optima caused by fixed sampling. The framework integrates Semantic-SAM–based semantic segmentation, multi-scale edge aggregation and refinement, and dynamic anchor redistribution with decoupled clustering. Evaluated on ETH3D and Tanks & Temples benchmarks, the approach achieves state-of-the-art performance, significantly improving reconstruction accuracy in textureless areas and enhancing cross-scene generalization capability.
📝 Abstract
Recently, patch deformation-based methods have demonstrated significant strength in multi-view stereo by adaptively expanding the reception field of patches to help reconstruct textureless areas. However, such methods mainly concentrate on searching for pixels without matching ambiguity (i.e., reliable pixels) when constructing deformed patches, while neglecting the deformation instability caused by unexpected edge-skipping, resulting in potential matching distortions. Addressing this, we propose MSP-MVS, a method introducing multi-granularity segmentation prior for edge-confined patch deformation. Specifically, to avoid unexpected edge-skipping, we first aggregate and further refine multi-granularity depth edges gained from Semantic-SAM as prior to guide patch deformation within depth-continuous (i.e., homogeneous) areas. Moreover, to address attention imbalance caused by edge-confined patch deformation, we implement adaptive equidistribution and disassemble-clustering of correlative reliable pixels (i.e., anchors), thereby promoting attention-consistent patch deformation. Finally, to prevent deformed patches from falling into local-minimum matching costs caused by the fixed sampling pattern, we introduce disparity-sampling synergistic 3D optimization to help identify global-minimum matching costs. Evaluations on ETH3D and Tanks&Temples benchmarks prove our method obtains state-of-the-art performance with remarkable generalization.