CAFOSat: A Strongly Annotated Dataset for Infrastructure-Aware CAFO Mapping Using High-Resolution Imagery

📅 2026-05-30

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This study addresses the challenges of large-scale remote sensing mapping of Concentrated Animal Feeding Operations (CAFOs), which are hindered by heterogeneous facility layouts, noisy labels, and incomplete inventories. To overcome these limitations, the authors construct the first infrastructure-level CAFO remote sensing dataset for the United States, encompassing over 45,000 image chips across 20 states by integrating multi-source inventories with high-resolution NAIP imagery. They propose a human-in-the-loop annotation pipeline that combines AI-assisted labeling, GradCAM-based localization, and geometric clustering. Additionally, they introduce two novel strategies: land-cover-guided negative sampling and infrastructure-aware synthetic augmentation. This approach substantially improves classification accuracy and cross-distribution robustness, establishing a new benchmark for remote sensing–based monitoring of agricultural infrastructure.

📝 Abstract

Concentrated Animal Feeding Operations (CAFOs) play an important role in agricultural production but are also associated with environmental, public health, and disease surveillance concerns. Large-scale mapping of CAFOs from remote sensing imagery remains challenging due to heterogeneous infrastructure layouts, noisy location records, inconsistent annotations, and incomplete inventories. We introduce CAFOSat, a strongly annotated, infrastructure-aware dataset for CAFO mapping across the United States. CAFOSat integrates high-resolution National Agriculture Imagery Program (NAIP) imagery with multi-source CAFO inventories collected across multiple states and transforms weak geolocation records into refined annotations through a human-in-the-loop pipeline combining AI-assisted annotation, GradCAM-based localization, and geometric clustering. To improve dataset quality, we curate challenging negative samples using land-cover-guided sampling with spatial exclusion constraints and provide infrastructure-level annotations, including barns, manure ponds, and grazing-related features, through manual verification. The resulting dataset contains more than 45,000 image patches spanning 20 states and four major CAFO categories. We benchmark a diverse set of convolutional, transformer-based, and vision-language models, demonstrating the value of refined annotations and curated negative samples for CAFO classification and generalization. In addition, we introduce a synthetic augmentation pipeline that generates infrastructure-aware variations to increase training diversity and improve robustness under distribution shifts. CAFOSat provides a large-scale benchmark for advancing infrastructure-aware agricultural monitoring and CAFO mapping from high-resolution remote sensing imagery.

Problem

Research questions and friction points this paper is trying to address.

CAFO mapping

remote sensing imagery

infrastructure heterogeneity

inconsistent annotations

noisy geolocation

Innovation

Methods, ideas, or system contributions that make the work stand out.

infrastructure-aware annotation

human-in-the-loop labeling

GradCAM-based localization