Achieving Effective Virtual Reality Interactions via Acoustic Gesture Recognition based on Large Language Models

📅 2025-11-10

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

Visual gesture recognition in VR incurs high computational overhead, suffers from lighting sensitivity and privacy risks, while existing acoustic approaches (e.g., CIR-based methods) rely heavily on labeled data and generalize poorly to few-shot scenarios. Method: This paper proposes the first large language model (LLM)-based acoustic gesture recognition framework. It leverages micro-Doppler effects to capture acoustic field perturbations induced by hand motions and employs differential channel impulse response (CIR) acquisition for low-power, privacy-preserving interaction. Contribution/Results: By innovatively integrating LLMs into acoustic gesture recognition, the framework enables few-shot and zero-shot classification without domain-specific fine-tuning. Evaluated on an empirical dataset comprising 15 gestures and 10 participants, it achieves accuracy comparable to supervised baselines—without task-specific adaptation—while significantly improving cross-user and cross-scenario generalization.

Technology Category

Application Category

📝 Abstract

Natural and efficient interaction remains a critical challenge for virtual reality and augmented reality (VR/AR) systems. Vision-based gesture recognition suffers from high computational cost, sensitivity to lighting conditions, and privacy leakage concerns. Acoustic sensing provides an attractive alternative: by emitting inaudible high-frequency signals and capturing their reflections, channel impulse response (CIR) encodes how gestures perturb the acoustic field in a low-cost and user-transparent manner. However, existing CIR-based gesture recognition methods often rely on extensive training of models on large labeled datasets, making them unsuitable for few-shot VR scenarios. In this work, we propose the first framework that leverages large language models (LLMs) for CIR-based gesture recognition in VR/AR systems. Despite LLMs'strengths, it is non-trivial to achieve few-shot and zero-shot learning of CIR gestures due to their inconspicuous features. To tackle this challenge, we collect differential CIR rather than original CIR data. Moreover, we construct a real-world dataset collected from 10 participants performing 15 gestures across three categories (digits, letters, and shapes), with 10 repetitions each. We then conduct extensive experiments on this dataset using an LLM-adopted classifier. Results show that our LLM-based framework achieves accuracy comparable to classical machine learning baselines, while requiring no domain-specific retraining.

Problem

Research questions and friction points this paper is trying to address.

Achieving natural VR interactions via acoustic gesture recognition

Overcoming limitations of vision-based gesture recognition systems

Enabling few-shot gesture learning using large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Acoustic sensing for gesture recognition

Differential CIR data collection method

LLM-based framework for few-shot learning

🔎 Similar Papers

No similar papers found.

ByteDance

圣何塞

Applied Machine Learning Research Engineer - Multimodal LLMs for Human Understanding

Apple

Sunnyvale, United States of America

Research Scientist Intern, Machine Perception for Input and Interaction (PhD)