DESAMO: A Device for Elder-Friendly Smart Homes Powered by Embedded LLM with Audio Modality

📅 2025-08-26

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

To address the challenges of poor speech recognition for elderly users’ disfluent speech and the inability of conventional ASR-LLM cascaded systems to detect non-speech events (e.g., falls, cries for help), this paper proposes DESAMO—the first edge-deployed, embedded Audio Large Language Model (Audio LLM) system tailored for elderly-friendly smart homes. DESAMO eliminates reliance on ASR by performing multi-granularity audio understanding directly on-device from raw waveforms, jointly modeling natural speech and critical non-speech events while ensuring real-time responsiveness, robustness, and on-device privacy preservation. Experiments demonstrate significant improvements in both elderly speech recognition accuracy and emergency event detection reliability, with average inference latency under 200 ms and end-to-end local data processing. Its core contributions are: (1) the first efficient deployment of an Audio LLM on resource-constrained embedded hardware, and (2) a novel end-to-end audio semantic understanding paradigm specifically designed for elderly users.

Technology Category

Application Category

📝 Abstract

We present DESAMO, an on-device smart home system for elder-friendly use powered by Audio LLM, that supports natural and private interactions. While conventional voice assistants rely on ASR-based pipelines or ASR-LLM cascades, often struggling with the unclear speech common among elderly users and unable to handle non-speech audio, DESAMO leverages an Audio LLM to process raw audio input directly, enabling a robust understanding of user intent and critical events, such as falls or calls for help.

Problem

Research questions and friction points this paper is trying to address.

Processes raw audio input directly for elderly users

Handles unclear speech and non-speech audio events

Enables private natural interactions in smart homes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Audio LLM processes raw audio directly

On-device system ensures private elder interactions

Detects critical events like falls and calls

🔎 Similar Papers

Large Language Models are Zero-Shot Recognizers for Activities of Daily Living