Thou Shalt Not Prompt: Zero-Shot Human Activity Recognition in Smart Homes via Language Modeling of Sensor Data & Activities

📅 2025-07-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing zero-shot human activity recognition (HAR) methods rely heavily on large language models (LLMs) via prompting, raising critical concerns regarding privacy leakage, dependency on external services, and model version inconsistency. To address these issues, this paper proposes an LLM-free zero-shot HAR framework. Our approach unifies sensor time-series data and activity semantics into natural language representations, leveraging a custom-designed semantic encoder to learn cross-modal embeddings and establishing a cross-domain alignment mechanism between activity texts and sensor features for end-to-end zero-shot classification. We present the first systematic evaluation of language-based embedding transferability across six real-world HAR datasets under multi-source, cross-scenario settings. Experimental results demonstrate substantial improvements in cross-environment generalization performance. The proposed method offers a deployable, robust, and LLM-free zero-shot HAR paradigm—particularly suitable for privacy-sensitive applications such as smart homes.

Technology Category

Application Category

📝 Abstract
Developing zero-shot human activity recognition (HAR) methods is a critical direction in smart home research -- considering its impact on making HAR systems work across smart homes having diverse sensing modalities, layouts, and activities of interest. The state-of-the-art solutions along this direction are based on generating natural language descriptions of the sensor data and feeding it via a carefully crafted prompt to the LLM to perform classification. Despite their performance guarantees, such ``prompt-the-LLM'' approaches carry several risks, including privacy invasion, reliance on an external service, and inconsistent predictions due to version changes, making a case for alternative zero-shot HAR methods that do not require prompting the LLMs. In this paper, we propose one such solution that models sensor data and activities using natural language, leveraging its embeddings to perform zero-shot classification and thereby bypassing the need to prompt the LLMs for activity predictions. The impact of our work lies in presenting a detailed case study on six datasets, highlighting how language modeling can bolster HAR systems in zero-shot recognition.
Problem

Research questions and friction points this paper is trying to address.

Develop zero-shot human activity recognition without LLM prompts
Address privacy and reliability issues in smart home HAR systems
Leverage language embeddings for zero-shot activity classification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Zero-shot HAR via language modeling
Embeddings replace LLM prompting
Privacy-preserving activity recognition