🤖 AI Summary
Existing in-situ camera-based monitoring of anthropogenic floating debris in urban rivers suffers from low accuracy and poor robustness. To address this, we propose an automated monitoring method that tightly integrates geometric modeling with deep learning. First, a projection geometric model is established via joint calibration of intrinsic and extrinsic camera parameters to enable reliable 2D-to-3D size mapping. Second, we systematically analyze the impact of data leakage and negative sample selection on detection performance and introduce a regression-based correction module to enhance dimensional estimation accuracy. Building upon comparative evaluation of multi-object detection architectures, we adopt a lightweight, efficient model achieving high-precision detection (mAP@0.5 = 89.3%) and real-time inference (>25 FPS) under challenging conditions—including complex illumination, occlusion, and water surface disturbance. This work pioneers the deep integration of geometric constraints into an end-to-end floating debris quantification pipeline, demonstrating the feasibility of a low-cost, reproducible, and highly robust intelligent monitoring system for urban water bodies.
📝 Abstract
The proliferation of floating anthropogenic debris in rivers has emerged as a pressing environmental concern, exerting a detrimental influence on biodiversity, water quality, and human activities such as navigation and recreation. The present study proposes a novel methodological framework for the monitoring the aforementioned waste, utilising fixed, in-situ cameras. This study provides two key contributions: (i) the continuous quantification and monitoring of floating debris using deep learning and (ii) the identification of the most suitable deep learning model in terms of accuracy and inference speed under complex environmental conditions. These models are tested in a range of environmental conditions and learning configurations, including experiments on biases related to data leakage. Furthermore, a geometric model is implemented to estimate the actual size of detected objects from a 2D image. This model takes advantage of both intrinsic and extrinsic characteristics of the camera. The findings of this study underscore the significance of the dataset constitution protocol, particularly with respect to the integration of negative images and the consideration of temporal leakage. In conclusion, the feasibility of metric object estimation using projective geometry coupled with regression corrections is demonstrated. This approach paves the way for the development of robust, low-cost, automated monitoring systems for urban aquatic environments.