🤖 AI Summary
This study addresses the lack of systematic evaluation of detection limits in existing satellite-based flood mapping, particularly regarding out-of-distribution reliability. Leveraging the geospatial foundation model Prithvi-EO-2.0, the authors employ a dual-reference validation framework and an iterative mapping pipeline across 19 previously unseen flood events spanning six continents, eight climate zones, and six flood mechanisms. They reveal for the first time that detection limits are jointly governed by land cover and flood type: agricultural areas exhibit the highest consistency (IoU = 52%), riverine floods achieve optimal detection performance (F1 = 0.69), while forests and built-up areas are nearly undetectable (IoU ≈ 4%). The analysis further identifies 23 failure modes, demonstrating that discrepancies in reference products are often misattributed to model errors, thereby underscoring that process engineering exerts a greater influence on performance than model capacity alone.
📝 Abstract
Floods are among the most destructive natural hazards, and their increasing frequency under climate change makes satellite-based inundation mapping essential for disaster response. Geospatial foundation models pretrained on satellite archives offer geographic transferability, but their operational reliability across diverse, unseen events remains uncharacterized. Here we deploy Prithvi-EO-2.0 across 19 out-of-distribution flood events (2017-2025) spanning six continents, eight climate zones, and six flood mechanisms, validating against two independent reference products. Detection accuracy depended jointly on land cover and flood type, with cropland yielding the highest agreement (IoU=52%) and riverine events the strongest detection (F1=0.69), while tree cover and built-up areas showed near-zero detection (IoU=4%) regardless of flood mechanism. Dual-reference validation revealed that apparent model error partly reflects definitional inconsistency between reference products rather than detection failure. Iterative pipeline testing identified 23 failure modes, with pipeline engineering dominating initial error over model capacity. These findings establish environment-dependent detection boundaries for operational satellite flood mapping.