TSQAgent: Rating Time Series Data Quality via Dedicated Agentic Reasoning

📅 2026-06-02

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Existing large language models struggle to automatically identify key quality dimensions in time series data and perform evidence-based quantitative comparisons. To address this limitation, this work proposes TSQAgent, a multi-agent collaborative framework that integrates three specialized roles—Perceiver, Inspector, and Adjudicator—to enable dimension-aware, tool-augmented quantitative analysis and holistic evaluation. Additionally, the study introduces TSQBench, the first benchmark specifically designed for time series quality assessment. Experimental results demonstrate that TSQAgent significantly enhances the model’s capacity for understanding and comparing time series quality on both TSQBench and eleven real-world datasets, leading to more effective data selection and improved performance in downstream tasks.

📝 Abstract

Assessing the quality of time series (TS) data is fundamental yet inherently challenging due to the multifaceted nature of quality dimensions. Recently, large language models (LLMs) have emerged as a promising paradigm for TS quality assessment via pairwise comparison and per-dimension evaluation. However, existing approaches rely on manually predefined quality dimensions and purely text-based reasoning, leaving it unknown whether LLMs can identify truly relevant quality dimensions or perform grounded and quantitative quality comparisons. To investigate this, we construct TSQBench, a dedicated benchmark for evaluating LLMs on two progressive capabilities: (i) understanding and identifying relevant quality dimensions, and (ii) performing quality comparison under specific dimensions. Our analysis reveals that current LLMs consistently struggle with both dimension identification and evidence-grounded quality comparison. To address these limitations, we propose TSQAgent, a novel agentic reasoning framework for TS quality rating consisting of three collaborative roles: Perceiver for focused dimension selection, Inspector for dimension-wise quantitative analysis, and Adjudicator that aggregates and refines the final judgment. In particular, we introduce an agentic reasoning strategy that instills the ability to identify and prioritize the most relevant quality dimensions, and further propose an agent workflow equipped with external analytical tools to enable precise quantitative comparisons over selected dimensions. Experiments on both the proposed benchmark and eleven real-world datasets demonstrate that our framework not only substantially improves LLMs' capabilities in quality understanding and quantitative comparison but also effectively translates these improvements into better quality-aware data selection, leading to enhanced downstream performance and data efficiency.

Problem

Research questions and friction points this paper is trying to address.

time series data quality

quality dimension identification

quantitative quality comparison

large language models

evidence-grounded reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

agentic reasoning

time series quality assessment

large language models