🤖 AI Summary
This study addresses the challenges of adapting general-purpose vision models to Google Street View (GSV) imagery for neighborhood health research, where limited generalization, scarce labeled data, and constrained computational resources often hinder performance. To overcome these limitations, this work proposes a practical model adaptation framework tailored for settings with small-scale annotated datasets and large volumes of unlabeled GSV images. The approach integrates foundation model transfer with unsupervised pretraining strategies to systematically enhance downstream task performance under resource constraints. Empirical results demonstrate that the proposed method substantially improves model robustness and scalability on GSV data, offering an efficient, low-cost, and reproducible pathway for computational social and public health research.
📝 Abstract
A substantial body of health research demonstrates a strong link between neighborhood environments and health outcomes. Recently, there has been increasing interest in leveraging advances in computer vision to enable large-scale, systematic characterization of neighborhood built environments. However, the generalizability of vision models across fundamentally different domains remains uncertain, for example, transferring knowledge from ImageNet to the distinct visual characteristics of Google Street View (GSV) imagery. In applied fields such as social health research, several critical questions arise: which models are most appropriate, whether to adopt unsupervised training strategies, what training scale is feasible under computational constraints, and how much such strategies benefit downstream performance. These decisions are often costly and require specialized expertise. In this paper, we answer these questions through empirical analysis and provide practical insights into how to select and adapt foundation models for datasets with limited size and labels, while leveraging larger, unlabeled datasets through unsupervised training. Our study includes comprehensive quantitative and visual analyses comparing model performance before and after unsupervised adaptation.