INTERACT: An AI-Driven Extended Reality Framework for Accesible Communication Featuring Real-Time Sign Language Interpretation and Emotion Recognition

📅 2026-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the insufficient support for deaf, hard-of-hearing, and multilingual users in existing video conferencing platforms, where conventional accessibility solutions often suffer from high costs and deployment challenges. To overcome these limitations, this work proposes the first immersive communication system within extended reality (XR) that integrates real-time speech-to-text transcription, 3D virtual avatar generation for international sign language, multilingual translation, and emotion recognition, enabling end-to-end accessible interaction for deaf users. Built upon the CORTEX2 framework and leveraging state-of-the-art models including Whisper, NLLB, RoBERTa, and MediaPipe, the system is deployed on the Meta Quest 3. Experimental results demonstrate a user satisfaction rate of 92%, speech transcription accuracy exceeding 85%, emotion recognition precision of 90%, an average experience rating of 4.6 out of 5.0, and willingness from 90% of participants to engage in future testing.
📝 Abstract
Video conferencing has become central to professional collaboration, yet most platforms offer limited support for deaf, hard-of-hearing, and multilingual users. The World Health Organisation estimates that over 430 million people worldwide require rehabilitation for disabling hearing loss, a figure projected to exceed 700 million by 2050. Conventional accessibility measures remain constrained by high costs, limited availability, and logistical barriers, while Extended Reality (XR) technologies open new possibilities for immersive and inclusive communication. This paper presents INTERACT (Inclusive Networking for Translation and Embodied Real-Time Augmented Communication Tool), an AI-driven XR platform that integrates real-time speech-to-text conversion, International Sign Language (ISL) rendering through 3D avatars, multilingual translation, and emotion recognition within an immersive virtual environment. Built on the CORTEX2 framework and deployed on Meta Quest 3 headsets, INTERACT combines Whisper for speech recognition, NLLB for multilingual translation, RoBERTa for emotion classification, and Google MediaPipe for gesture extraction. Pilot evaluations were conducted in two phases, first with technical experts from academia and industry, and subsequently with members of the deaf community. The trials reported 92% user satisfaction, transcription accuracy above 85%, and 90% emotion-detection precision, with a mean overall experience rating of 4.6 out of 5.0 and 90% of participants willing to take part in further testing. The results highlight strong potential for advancing accessibility across educational, cultural, and professional settings. An extended version of this work, including full pilot data and implementation details, has been published as an Open Research Europe article [Tantaroudas et al., 2026a].
Problem

Research questions and friction points this paper is trying to address.

accessible communication
sign language interpretation
deaf and hard-of-hearing
extended reality
real-time translation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extended Reality (XR)
Real-Time Sign Language Interpretation
Emotion Recognition
AI-Driven Accessibility
Immersive Communication
🔎 Similar Papers
No similar papers found.
N
Nikolaos D. Tantaroudas
Institute of Communications and Computer Systems (ICCS), Iroon Polytechneiou 9, 15773 Zografou, Athens, Greece
A
Andrew J. McCracken
DASKALOS-APPS, 183 Rue de l’Abbé Griffon, 01960 Péronnas, France
I
Ilias Karachalios
National Technical University of Athens, Leof. Alimou, Katechaki, Zografou, 15772 Athens, Greece
Evangelos Papatheou
Evangelos Papatheou
University of Exeter
Structural Dynamics and ControlStructural Health Monitoring