Do VLMs See What Sensors Feel? A Scalable Expert-Guided Design for Wheelchair Accessibility Assessment from Street View

📅 2026-06-01

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the challenge of large-scale accessibility assessment in real-world environments, where wheelchair navigation barriers are widespread, context-dependent, and often temporary. The authors propose the first scalable framework that integrates expert knowledge with vision-language models (VLMs), leveraging Google Street View imagery, Americans with Disabilities Act (ADA) guidelines, and a retrieval-augmented mechanism. The framework’s validity is corroborated using GPS-derived wheelchair dwell-time data. Experimental results demonstrate a significant negative correlation between VLM-generated accessibility scores and observed dwell times, indicating the model’s effectiveness in identifying critical infrastructure such as curb ramps. However, limitations persist in detecting subtle ground surface conditions and transient obstacles. This work represents the first large-scale accessibility analysis aligned with real-world mobility behavior data.

📝 Abstract

Assessing built-environment interaction, such as wheelchair accessibility, is difficult because real-world mobility is shaped by distributed, context-dependent, and temporary barriers that are hard to capture at scale. To support scalable assessment, this paper examines whether vision-language models (VLMs) can identify accessibility barriers from Google Street View (GSV) imagery. We propose an expert-guided retrieval-augmented framework that combines GSV images, ADA-informed guidance, and expert-derived rubrics to evaluate accessibility dimensions. We collect a campus-scale dataset at the University of Florida, linking 407 unique GSV locations with GPS-derived wheelchair dwell behavior as a mobility-friction signal. Results show that VLM ratings are both negatively correlated and distributionally similar with dwell time, indicating partial but consistent alignment with a behavioral proxy for mobility friction. Visual cue analysis shows that certain environmental objects, such as curb ramps and crosswalks, are associated with higher VLM accessibility scores, while alignment remains limited for subtle surface conditions, transient obstructions, and viewpoint-dependent barriers. Overall, our findings show the potential of expert-guided VLMs for scalable accessibility assessment aligning with sensor-derived indicators of real-world wheelchair navigation.

Problem

Research questions and friction points this paper is trying to address.

wheelchair accessibility

built-environment assessment

mobility barriers

scalable evaluation

street view imagery

Innovation

Methods, ideas, or system contributions that make the work stand out.

vision-language models

wheelchair accessibility

retrieval-augmented framework