🤖 AI Summary
To address accessibility barriers for blind and low-vision (BLV) users in virtual 3D environments—stemming from impaired spatial perception and navigation—this paper proposes a generative AI–driven dynamic accessibility framework. Unlike conventional static auditory or haptic aids, our approach integrates large language models (LLMs) directly into real-time 3D rendering pipelines, enabling natural-language–guided scene querying, semantic understanding, and on-the-fly scene reconstruction. The system supports multimodal closed-loop interaction via speech input and programmable haptic feedback. Experimental evaluation demonstrates significant improvements in BLV users’ situational awareness accuracy and task completion efficiency. Our core contribution lies in unifying context-awareness, runtime adaptability, and personalized assistance within a single generative AI architecture—establishing a novel paradigm for AI-augmented accessibility research in immersive virtual environments.
📝 Abstract
As virtual 3D environments become prevalent, equitable access is crucial for blind and low-vision (BLV) users who face challenges with spatial awareness, navigation, and interactions. To address this gap, previous work explored supplementing visual information with auditory and haptic modalities. However, these methods are static and offer limited support for dynamic, in-context adaptation. Recent work in generative AI enables users to query and modify 3D scenes via natural language, introducing a paradigm with increased flexibility and control for accessibility improvements. We present RAVEN, a system that responds to query or modification prompts from BLV users to improve the runtime accessibility of 3D virtual scenes. We evaluated the system with eight BLV people, uncovering key insights into the strengths and shortcomings of generative AI-driven accessibility in virtual 3D environments, pointing to promising results as well as challenges related to system reliability and user trust.