A Semantic Parsing Framework for End-to-End Time Normalization

📅 2025-07-08

📈 Citations: 0

✨ Influential: 0

career value

151K/year

🤖 AI Summary

Time normalization faces challenges due to the limited expressivity of ISO-TimeML, which struggles to model compositional, event-relative, and multi-span temporal expressions. To address this, we propose a novel code-generation paradigm grounded in the SCATE semantic framework: time normalization is formulated as an executable SCATE code generation task. We construct the first Python-executable SCATE semantic library and design an LLM-driven data augmentation pipeline that automatically generates large-scale, code-verified annotations. Our approach integrates semantic parsing, symbolic temporal representation, and LLM-based code generation. Experiments demonstrate that fine-tuning small local models solely on the augmented data yields significant improvements over their large foundation models—achieving superior accuracy, interpretability, and practical utility. This work marks the first successful realization of high-precision, verifiable, and executable temporal semantic parsing.

Technology Category

Application Category

📝 Abstract

Time normalization is the task of converting natural language temporal expressions into machine-readable representations. It underpins many downstream applications in information retrieval, question answering, and clinical decision-making. Traditional systems based on the ISO-TimeML schema limit expressivity and struggle with complex constructs such as compositional, event-relative, and multi-span time expressions. In this work, we introduce a novel formulation of time normalization as a code generation task grounded in the SCATE framework, which defines temporal semantics through symbolic and compositional operators. We implement a fully executable SCATE Python library and demonstrate that large language models (LLMs) can generate executable SCATE code. Leveraging this capability, we develop an automatic data augmentation pipeline using LLMs to synthesize large-scale annotated data with code-level validation. Our experiments show that small, locally deployable models trained on this augmented data can achieve strong performance, outperforming even their LLM parents and enabling practical, accurate, and interpretable time normalization.

Problem

Research questions and friction points this paper is trying to address.

Convert natural language time expressions to machine-readable formats

Overcome limitations of traditional ISO-TimeML schema systems

Enable accurate interpretable time normalization using LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Formulates time normalization as code generation

Uses SCATE framework for symbolic semantics

Leverages LLMs for automatic data augmentation

🔎 Similar Papers

Towards Effective Time-Aware Language Representation: Exploring Enhanced Temporal Understanding in Language Models