Giving Sensors a Voice: Multimodal JEPA for Semantic Time-Series Embeddings

📅 2026-05-29
📈 Citations: 0
Influential: 0
📄 PDF

career value

177K/year
🤖 AI Summary
This work addresses the limited capability of existing methods to learn general-purpose representations for heterogeneous multivariate time series. The authors propose CHARM, a novel model that uniquely integrates textual descriptions of sensor channels with a channel-order equivariant Transformer encoder within a Joint Embedding Predictive Architecture (JEPA) to learn semantic-rich and noise-robust temporal representations. The approach innovatively introduces a description-aware gating mechanism and a tailored loss function to enhance the informativeness and temporal stability of learned embeddings, while enabling cross-dataset generalization and interpretable modeling of inter-channel relationships. Remarkably, CHARM achieves strong performance across anomaly detection, classification, and short- to long-term forecasting tasks using only linear probes, demonstrating the effectiveness of the proposed architecture.
📝 Abstract
Transformer-based architectures have advanced sequence modeling in language and vision, yet general-purpose representation learning for heterogeneous multivariate time series remains underexplored. We introduce CHARM (Channel-Aware Representation Model), which incorporates channel-level textual descriptions into a Transformer encoder equivariant to channel order. CHARM is trained with a Joint Embedding Predictive Architecture (JEPA) and a novel loss promoting informative, temporally stable embeddings; latent-space prediction encourages robustness to sensor noise while description-aware gating provides interpretability through learned inter-channel relationships. Across anomaly detection, classification, and short- and long-term forecasting, the learned embeddings achieve strong performance using only a linear probe. Performance is driven primarily by the JEPA objective and conditioning architecture, with text descriptions serving as channel identifiers for cross-dataset generalization.
Problem

Research questions and friction points this paper is trying to address.

multivariate time series
representation learning
semantic embeddings
cross-dataset generalization
sensor data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal JEPA
Semantic Time-Series Embeddings
Channel-Aware Representation
Transformer Equivariance
Text-Conditioned Forecasting