Scrapping The Web For Early Wildfire Detection: A New Annotated Dataset of Images and Videos of Smoke Plumes In-the-wild

📅 2024-02-08

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Early wildfire detection (EWD) critically depends on high-quality smoke data, yet existing datasets suffer from limited scale, narrow scene coverage, and lack of video sequences. To address this, we introduce PyroNear-2024—the first multi-source smoke dataset integrating web-scraped real-world videos, self-deployed camera streams, and synthetic/real images. It encompasses 400 real wildfires, over 50,000 image/video frames, and 150,000 fine-grained annotations, ensuring geographic diversity and modality completeness (image + video). We propose a lightweight YOLO-based detection framework augmented with temporal modeling via ConvLSTM. In cross-domain evaluation, our method achieves an F1-score of 60%, substantially outperforming baselines. Video-sequence modeling improves global recall without compromising precision. Furthermore, joint training with publicly available datasets enhances generalization. PyroNear-2024 thus establishes a new benchmark for robust, scalable EWD research.

Technology Category

Application Category

📝 Abstract

Early wildfire detection is of the utmost importance to enable rapid response efforts, and thus minimize the negative impacts of wildfire spreads. To this end, we present PyroNear-2024, a new dataset composed of both images and videos, allowing for the training and evaluation of smoke plume detection models, including sequential models. The data is sourced from: extit{(i)} web-scraped videos of wildfires from public networks of cameras for wildfire detection in-the-wild, ext{(ii)} videos from our in-house network of cameras, and extit{(iii)} a small portion of synthetic and real images. This dataset includes around 150,000 manual annotations on 50,000 images, covering 400 wildfires, Pyro surpasses existing datasets in size and diversity. It includes data from France, Spain, and the United States. Finally, it is composed of both images and videos, allowing for the training and evaluation of smoke plume detection models, including sequential models. We ran cross-dataset experiments using a lightweight state-of-the-art object detection model and found out the proposed dataset is particularly challenging, with F1 score of around 60%, but more stable than existing datasets. The video part of the dataset can be used to train a lightweight sequential model, improving global recall while maintaining precision. Finally, its use in concordance with other public dataset helps to reach higher results overall. We will make both our code and data available.

Problem

Research questions and friction points this paper is trying to address.

Developing a real-world benchmark for early wildfire detection

Creating a diverse dataset with images and videos from multiple countries

Training sequential models to improve smoke plume detection recall

Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed PYRONEAR-2025 multi-source wildfire dataset

Used lightweight object detection model for experiments

Trained sequential models with video data for recall

🔎 Similar Papers

No similar papers found.

Authors to Follow