Beamforming-LLM: What, Where and When Did I Miss?

📅 2025-09-07

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

In multi-speaker scenarios, users frequently miss critical conversational content. Method: This paper proposes an intelligent auditory memory system for personal space computing. It introduces the first integration of beamforming with retrieval-augmented generation (RAG): spatial audio is captured via a microphone array and speaker-separated using directional beamforming; Whisper-based transcription and sentence encoding jointly construct a temporally aligned embedding database; GPT-4o-mini generates spatiotemporally tagged contrastive summaries, enabling natural-language querying and interactive 3D audio playback. Contribution/Results: We present the first end-to-end auditory memory framework supporting semantic retrieval, context-aware summarization, and spatial audio reconstruction. Evaluated in realistic multi-speaker environments, it achieves high-precision topic retrieval and interpretable traceability, significantly enhancing users’ ability to semantically recall missed dialogue.

Technology Category

Application Category

📝 Abstract

We present Beamforming-LLM, a system that enables users to semantically recall conversations they may have missed in multi-speaker environments. The system combines spatial audio capture using a microphone array with retrieval-augmented generation (RAG) to support natural language queries such as, "What did I miss when I was following the conversation on dogs?" Directional audio streams are separated using beamforming, transcribed with Whisper, and embedded into a vector database using sentence encoders. Upon receiving a user query, semantically relevant segments are retrieved, temporally aligned with non-attended segments, and summarized using a lightweight large language model (GPT-4o-mini). The result is a user-friendly interface that provides contrastive summaries, spatial context, and timestamped audio playback. This work lays the foundation for intelligent auditory memory systems and has broad applications in assistive technology, meeting summarization, and context-aware personal spatial computing.

Problem

Research questions and friction points this paper is trying to address.

Semantically recalling missed conversations in multi-speaker environments

Combining spatial audio capture with retrieval-augmented generation

Providing contrastive summaries with spatial context and timestamped playback

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines spatial audio capture with retrieval-augmented generation

Uses beamforming to separate directional audio streams

Generates contrastive summaries using lightweight LLM

🔎 Similar Papers

Multi-Gigabit Interactive Extended Reality over Millimeter-Wave: An End-to-End System Approach