🤖 AI Summary
Traditional time-series models struggle to incorporate textual context, while large language models (LLMs) lack native capability to model numerical time-series data. Method: We propose TsLLM—a novel framework that, for the first time, injects time-series awareness into LLMs via a patch-based encoder-decoder architecture. TsLLM is pre-trained on large-scale interleaved text and time-series data, jointly optimizing contextual learning and temporal modeling. Contribution/Results: TsLLM enables diverse multimodal reasoning tasks—including context-aware forecasting, time-series question answering, pattern interpretation, and automated report generation. Experiments demonstrate that TsLLM significantly outperforms existing baselines on cross-modal time-series understanding benchmarks. Crucially, it preserves strong language understanding while achieving accurate numerical forecasting and interpretable, interactive inference—thereby bridging the semantic gap between purely textual and purely numerical modeling paradigms.
📝 Abstract
Time series data is fundamental to decision-making in many crucial domains including healthcare, finance, and environmental science. However, analyzing this data often requires incorporating unstructured contextual information, answering domain-specific questions, and generating natural language explanations -- capabilities that traditional time series models lack due to their inability to process text. While Large Language Models (LLMs) excel at contextual reasoning and knowledge integration, they struggle with numerical time series due to inefficient text-based representations and limited exposure to temporal data during pretraining. We address this gap by augmenting an LLM with specialized time series perception through a patch-based encoder-decoder architecture. We train this Time Series-augmented LLM (TsLLM) on a large corpus of over 2 million interleaved time series and text examples spanning diverse analysis tasks: forecasting with contextual information, time series question-answering, pattern explanation, classification with natural language outputs, and report generation. This training enables TsLLM to leverage both its language understanding and newly acquired temporal reasoning capabilities. While not designed to surpass specialized models on traditional benchmarks, TsLLM demonstrates strong performance on tasks requiring the integration of time series analysis with natural language -- capabilities that existing approaches cannot provide. Our work establishes a new paradigm for time series analysis that bridges numerical computation and natural language understanding, democratizing access to sophisticated temporal reasoning through natural language interaction.