🤖 AI Summary
Current AR systems lack support for deaf and hard-of-hearing (DHH) users to independently construct personalized sound visualization interfaces. To address this, we propose a natural language–driven, sonically responsive AR authoring framework: users articulate requirements in natural language; the system jointly performs real-time audio analysis—extracting features such as dominant frequency and energy—and employs multi-agent large language models for collaborative reasoning, thereby programmatically generating dynamic 2D vector graphics whose visual attributes (e.g., size, color, motion) are semantically mapped to acoustic features. This work introduces the first zero-code, low-threshold approach enabling DHH users to customize accessible sound visualization AR interfaces. Experimental evaluation confirms its technical feasibility and interactive efficacy, significantly enhancing both personalization and accessibility of sound perception. Our framework establishes a novel AI-augmented paradigm for inclusive human–computer interaction.
📝 Abstract
Augmented reality (AR) has shown promise for supporting Deaf and hard-of-hearing (DHH) individuals by captioning speech and visualizing environmental sounds, yet existing systems do not allow users to create personalized sound visualizations. We present SonoCraftAR, a proof-of-concept prototype that empowers DHH users to author custom sound-reactive AR interfaces using typed natural language input. SonoCraftAR integrates real-time audio signal processing with a multi-agent LLM pipeline that procedurally generates animated 2D interfaces via a vector graphics library. The system extracts the dominant frequency of incoming audio and maps it to visual properties such as size and color, making the visualizations respond dynamically to sound. This early exploration demonstrates the feasibility of open-ended sound-reactive AR interface authoring and discusses future opportunities for personalized, AI-assisted tools to improve sound accessibility.