🤖 AI Summary
This work addresses the challenges of subpixel-scale thermal anomalies, extreme class imbalance, and uncalibrated single-band imagery in mid-wave infrared satellite data by proposing the first lightweight, dense self-supervised representation learning framework for onboard real-time wildfire detection. Built upon DenseMAE—a dense masked autoencoder pretraining scheme—the method integrates exponential moving average (EMA) knowledge distillation and TensorRT FP16 acceleration to enable efficient deployment under stringent onboard constraints: model size under 1 MB, inference latency below 150 ms, and alert generation within 10 minutes of satellite overpass. The optimized model achieves a pixel-level average precision of 0.699 and an event-level Fire-F1 score of 0.744, with only 0.52 MB in size and 65.34 ms inference latency, significantly outperforming supervised baselines under equivalent conditions.
📝 Abstract
We present a deployed system for on-orbit wildfire detection aboard a nine-satellite commercial thermal infrared constellation, operating under demanding joint constraints: sub-megabyte model footprint, sub-150 ms per-batch TensorRT FP16 inference on an NVIDIA Jetson Xavier NX, and an end-to-end alert pipeline targeting under 10 minutes from satellite overpass to fire event communication. The system operates on uncalibrated mid-wave infrared (MWIR) single-band imagery at 200 m ground sampling distance, where fires frequently appear as sub-pixel or single-pixel thermal anomalies under extreme class imbalance -- challenges not addressed by the contextual thermal-thresholding pipelines (MODIS, VIIRS) that currently dominate operational fire monitoring. We present an empirical study of lightweight dense representation learning for this regime using a proprietary nine-satellite MWIR dataset. We compare dense masked autoencoding (DenseMAE) and a hybrid DenseMAE+EMA (exponential moving average) distillation variant, and evaluate representations via linear probing and full-distribution pixel-level average precision (AP) under extreme class imbalance. DenseMAE pretraining enables compact downstream models on the latency-accuracy Pareto frontier: our fastest SSL-pretrained model achieves 0.640 test AP and 0.69 event-level Fire-F1 with 65.34 ms latency per batch and a 0.52 MB engine, without pruning or compression. The best configuration reaches 0.699 AP and 0.744 Fire-F1 below 1 MB, outperforming a supervised baseline (0.650 AP) under comparable constraints.