๐ค AI Summary
Existing approaches for recognizing Activities of Daily Living (ADLs) from event-triggered environmental sensor streams suffer from three key limitations: (1) sequence models are noise-sensitive and lack spatial modeling capability; (2) image-based models compress temporal dynamics and distort sensor topology; and (3) naive fusion strategies fail to exploit cross-modal complementarity. To address these, we propose CARE, an end-to-end framework featuring a novel sequenceโimage contrastive alignment mechanism that jointly optimizes representation learning and classification. CARE employs a time-aware sequential encoder and a frequency-sensitive spatial image encoder, coupled with a unified contrastive-classification loss to simultaneously achieve cross-modal alignment and discriminative representation learning. Evaluated on three CASAS benchmark datasets, CARE achieves state-of-the-art performance (Milan: 89.8%, Cairo: 88.9%, Kyoto7: 73.3%) and demonstrates strong robustness to sensor failures and layout variations.
๐ Abstract
The recognition of Activities of Daily Living (ADLs) from event-triggered ambient sensors is an essential task in Ambient Assisted Living, yet existing methods remain constrained by representation-level limitations. Sequence-based approaches preserve temporal order of sensor activations but are sensitive to noise and lack spatial awareness, while image-based approaches capture global patterns and implicit spatial correlations but compress fine-grained temporal dynamics and distort sensor layouts. Naive fusion (e.g., feature concatenation) fail to enforce alignment between sequence- and image-based representation views, underutilizing their complementary strengths. We propose Contrastive Alignment for ADL Recognition from Event-Triggered Sensor Streams (CARE), an end-to-end framework that jointly optimizes representation learning via Sequence-Image Contrastive Alignment (SICA) and classification via cross-entropy, ensuring both cross-representation alignment and task-specific discriminability. CARE integrates (i) time-aware, noise-resilient sequence encoding with (ii) spatially-informed and frequency-sensitive image representations, and employs (iii) a joint contrastive-classification objective for end-to-end learning of aligned and discriminative embeddings. Evaluated on three CASAS datasets, CARE achieves state-of-the-art performance (89.8% on Milan, 88.9% on Cairo, and 73.3% on Kyoto7) and demonstrates robustness to sensor malfunctions and layout variability, highlighting its potential for reliable ADL recognition in smart homes.