🤖 AI Summary
This work addresses the limitations of existing vision datasets, which predominantly rely on low-resolution, provenance-uncertain JPEG images and lack high-fidelity visual content and spatial context from real-world environments. To bridge this gap, the authors introduce a large-scale, high-fidelity scene dataset comprising 67,574 images captured across 810 real-world physical locations spanning 260 indoor, outdoor, and natural scene categories. Using a Canon EOS R5 camera, images were acquired at 5-degree horizontal intervals with multiple elevation angles, yielding synchronized 14-bit CR3 RAW and corresponding JPEG pairs alongside complete EXIF metadata. This dataset establishes a new ecologically valid and quality-controlled benchmark for research on viewpoint-dependent recognition in humans and models, real-world scene understanding, statistical analysis of natural images, and full-field-of-view vision experiments.
📝 Abstract
Large image datasets have accelerated progress in cognitive neuroscience and computer vision. However, most datasets are low-resolution, internet-sourced JPEGs with unknown capture conditions and limited spatial context. Places in the Wild is a dataset of 67,574 high-resolution photographs collected in situ across 810 physical locations spanning 260 basic-level scene categories, including indoor, urban, and natural environments. At each location, a 45-megapixel Canon EOS R5 mounted on a panoramic tripod captured 72 images at 5-degree horizontal intervals plus 12 images at varying elevations, yielding dense 360-degree viewpoint sampling. All images were recorded simultaneously as 14-bit RAW (CR3) files and compressed JPEGs, preserving sensor-level detail for analyses of luminance, contrast, color, and other image statistics. The dataset is accompanied by complete EXIF metadata and a suite of image-quality metrics. Places in the Wild supports research on viewpoint-dependent recognition in humans and models, training and evaluation of scene-understanding systems under realistic conditions, characterization of natural scene statistics, and experiments requiring near-full-field visual displays.