TRACE: A Temporal Conditional Estimation for Multimodal Time Series Foundation Models

📅 2026-06-04

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

This work addresses the degradation of representations in multimodal time series caused by temporal misalignment and partial modality missingness in real-world scenarios. To tackle this challenge, the authors propose a conditional estimation–based multimodal fusion architecture that, for the first time, integrates a conditional generation mechanism into foundational models for multimodal time series. The approach systematically reconstructs missing target modalities using available auxiliary modalities while jointly optimizing temporal alignment and cross-modal dependency modeling. Extensive experiments on benchmark datasets—including MIMIC-IV, CMU-MOSI, and CMU-MOSEI—demonstrate that the proposed method significantly outperforms existing approaches, exhibiting superior robustness and enhanced cross-modal representation capability, particularly under severe modality missingness conditions.

📝 Abstract

Time series foundation models (TS-FMs) aim to learn generalizable temporal representations that can be adapted to a wide range of downstream tasks. In real-world multimodal settings, time series are frequently affected by temporal misalignment and partial modality missingness, where different modalities are observed at heterogeneous time scales or are partially absent. Existing approaches typically rely on naive imputation or masking strategies, which fail to account for cross-modal dependencies and often lead to misaligned or degraded representations. We propose TRACE, a conditional estimation paradigm for multimodal time series foundation model pipelines under missingness and irregular sampling, allowing incomplete target modalities to be systematically inferred from available auxiliary modalities. We evaluate TRACE on diverse multimodal benchmarks spanning healthcare and affective computing, including the MIMIC-IV clinical dataset and the CMU-MOSI and CMU-MOSEI benchmarks for multimodal sentiment analysis. Across a range of downstream prediction tasks and missing-modality settings, TRACE consistently outperforms prior multimodal fusion approaches, demonstrating improved robustness to severe modality missingness and more reliable cross-modal representations.

Problem

Research questions and friction points this paper is trying to address.

temporal misalignment

modality missingness

multimodal time series

cross-modal dependencies

irregular sampling

Innovation

Methods, ideas, or system contributions that make the work stand out.

Temporal Conditional Estimation

Multimodal Time Series

Modality Missingness