🤖 AI Summary
Existing approaches to smart home sensor analysis are hindered by scarce annotations, reliance solely on inertial data, dependence on natural language prompts, or the need for expensive hardware, making it difficult to achieve both privacy preservation and low-cost deployment. This work proposes DomusFM—the first self-supervised foundation model tailored for binary sensor data—that operates without external language descriptions or auxiliary services. By employing dual contrastive learning, DomusFM jointly models event semantics and temporal dependencies. Evaluated via leave-one-out cross-validation across seven public datasets, the model achieves state-of-the-art performance with only 5% labeled data for fine-tuning, significantly enhancing few-shot generalization in activity and event recognition tasks and enabling efficient cross-environment and cross-task transfer.
📝 Abstract
Smart-home sensor data holds significant potential for several applications, including healthcare monitoring and assistive technologies. Existing approaches, however, face critical limitations. Supervised models require impractical amounts of labeled data. Foundation models for activity recognition focus only on inertial sensors, failing to address the unique characteristics of smart-home binary sensor events: their sparse, discrete nature combined with rich semantic associations. LLM-based approaches, while tested in this domain, still raise several issues regarding the need for natural language descriptions or prompting, and reliance on either external services or expensive hardware, making them infeasible in real-life scenarios due to privacy and cost concerns. We introduce DomusFM, the first foundation model specifically designed and pretrained for smart-home sensor data. DomusFM employs a self-supervised dual contrastive learning paradigm to capture both token-level semantic attributes and sequence-level temporal dependencies. By integrating semantic embeddings from a lightweight language model and specialized encoders for temporal patterns and binary states, DomusFM learns generalizable representations that transfer across environments and tasks related to activity and event analysis. Through leave-one-dataset-out evaluation across seven public smart-home datasets, we demonstrate that DomusFM outperforms state-of-the-art baselines on different downstream tasks, achieving superior performance even with only 5% of labeled training data available for fine-tuning. Our approach addresses data scarcity while maintaining practical deployability for real-world smart-home systems.