🤖 AI Summary
Gas leakage detection and segmentation remain challenging due to the high concealment, non-rigid deformation, and low contrast of leaking plumes. To address these issues, this paper proposes an end-to-end fine-grained spatiotemporal awareness framework. Methodologically, it introduces (i) a novel inter-frame motion modeling mechanism based on correlation volume; (ii) a history-output feedback-driven temporal feature refinement strategy; and (iii) a multi-scale decoder to enhance boundary delineation. Contributions include: (i) the first high-quality, manually annotated video dataset for gas leakage—GasVid; and (ii) state-of-the-art performance on GasVid, achieving a 12.6% improvement in mask IoU and 91.4% boundary accuracy—particularly robust for low-contrast and highly deformable non-rigid leakage targets.
📝 Abstract
Gas leaks pose significant risks to human health and the environment. Despite long-standing concerns, there are limited methods that can efficiently and accurately detect and segment leaks due to their concealed appearance and random shapes. In this paper, we propose a Fine-grained Spatial-Temporal Perception (FGSTP) algorithm for gas leak segmentation. FGSTP captures critical motion clues across frames and integrates them with refined object features in an end-to-end network. Specifically, we first construct a correlation volume to capture motion information between consecutive frames. Then, the fine-grained perception progressively refines the object-level features using previous outputs. Finally, a decoder is employed to optimize boundary segmentation. Because there is no highly precise labeled dataset for gas leak segmentation, we manually label a gas leak video dataset, GasVid. Experimental results on GasVid demonstrate that our model excels in segmenting non-rigid objects such as gas leaks, generating the most accurate mask compared to other state-of-the-art (SOTA) models.