Achieving Effective Virtual Reality Interactions via Acoustic Gesture Recognition based on Large Language Models

📅 2025-11-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Visual gesture recognition in VR incurs high computational overhead, suffers from lighting sensitivity and privacy risks, while existing acoustic approaches (e.g., CIR-based methods) rely heavily on labeled data and generalize poorly to few-shot scenarios. Method: This paper proposes the first large language model (LLM)-based acoustic gesture recognition framework. It leverages micro-Doppler effects to capture acoustic field perturbations induced by hand motions and employs differential channel impulse response (CIR) acquisition for low-power, privacy-preserving interaction. Contribution/Results: By innovatively integrating LLMs into acoustic gesture recognition, the framework enables few-shot and zero-shot classification without domain-specific fine-tuning. Evaluated on an empirical dataset comprising 15 gestures and 10 participants, it achieves accuracy comparable to supervised baselines—without task-specific adaptation—while significantly improving cross-user and cross-scenario generalization.

Technology Category

Application Category

📝 Abstract
Natural and efficient interaction remains a critical challenge for virtual reality and augmented reality (VR/AR) systems. Vision-based gesture recognition suffers from high computational cost, sensitivity to lighting conditions, and privacy leakage concerns. Acoustic sensing provides an attractive alternative: by emitting inaudible high-frequency signals and capturing their reflections, channel impulse response (CIR) encodes how gestures perturb the acoustic field in a low-cost and user-transparent manner. However, existing CIR-based gesture recognition methods often rely on extensive training of models on large labeled datasets, making them unsuitable for few-shot VR scenarios. In this work, we propose the first framework that leverages large language models (LLMs) for CIR-based gesture recognition in VR/AR systems. Despite LLMs'strengths, it is non-trivial to achieve few-shot and zero-shot learning of CIR gestures due to their inconspicuous features. To tackle this challenge, we collect differential CIR rather than original CIR data. Moreover, we construct a real-world dataset collected from 10 participants performing 15 gestures across three categories (digits, letters, and shapes), with 10 repetitions each. We then conduct extensive experiments on this dataset using an LLM-adopted classifier. Results show that our LLM-based framework achieves accuracy comparable to classical machine learning baselines, while requiring no domain-specific retraining.
Problem

Research questions and friction points this paper is trying to address.

Achieving natural VR interactions via acoustic gesture recognition
Overcoming limitations of vision-based gesture recognition systems
Enabling few-shot gesture learning using large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Acoustic sensing for gesture recognition
Differential CIR data collection method
LLM-based framework for few-shot learning
🔎 Similar Papers
No similar papers found.
X
Xijie Zhang
Department of Computer Science, Hong Kong Baptist University, Hong Kong, China
F
Fengliang He
Department of Computer Science, Hong Kong Baptist University, Hong Kong, China
Hong-Ning Dai
Hong-Ning Dai
Hong Kong Baptist University
Industrial Internet of ThingsBlockchain TechnologiesExtended RealityBig Data Analytics