RASTeR: Robust, Agentic, and Structured Temporal Reasoning

📅 2024-06-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit limited robustness in temporal question answering (TQA), primarily due to noise, obsolescence, and temporal inconsistency in retrieved evidence—hindering deployment in high-stakes applications such as clinical event sequencing and policy tracking. To address this, we propose the first decoupled two-stage prompting framework: Stage I assesses contextual relevance and temporal consistency of evidence; Stage II constructs and dynamically refines a temporal knowledge graph (TKG) to enable selective filtering of contradictory information and robust temporal reasoning. The method integrates LLM-driven prompt engineering, TKG-based representation learning, and context-aware robustness evaluation. Experiments across multiple benchmarks and LLMs demonstrate substantial improvements in temporal reasoning robustness. Notably, under a challenging “needle-in-a-haystack” setting with 40 irrelevant distractors, our framework achieves 75% accuracy—outperforming the best prior approach by over 12 percentage points.

Technology Category

Application Category

📝 Abstract
Temporal question answering (TQA) remains a challenge for large language models (LLMs), particularly when retrieved content may be irrelevant, outdated, or temporally inconsistent. This is especially critical in applications like clinical event ordering, and policy tracking, which require reliable temporal reasoning even under noisy or outdated information. To address this challenge, we introduce RASTeR: extbf{R}obust, extbf{A}gentic, and extbf{S}tructured, extbf{Te}mporal extbf{R}easoning, a prompting framework that separates context evaluation from answer generation. RASTeR first assesses the relevance and temporal coherence of the retrieved context, then constructs a temporal knolwedge graph (TKG) to better facilitate reasoning. When inconsistencies are detected, RASTeR selectively corrects or discards context before generating an answer. Across multiple datasets and LLMs, RASTeR consistently improves robustnessfootnote{ Some TQA work defines robustness as handling diverse temporal phenomena. Here, we define it as the ability to answer correctly despite suboptimal context}. We further validate our approach through a ``needle-in-the-haystack''study, in which relevant context is buried among distractors. With forty distractors, RASTeR achieves 75% accuracy, over 12% ahead of the runner up
Problem

Research questions and friction points this paper is trying to address.

Addresses temporal question answering challenges with noisy or inconsistent contexts
Improves robustness in clinical event ordering and policy tracking applications
Enhances reasoning accuracy when relevant information is buried among distractors
Innovation

Methods, ideas, or system contributions that make the work stand out.

Separates context evaluation from answer generation
Constructs temporal knowledge graph for reasoning
Selectively corrects or discards inconsistent context
🔎 Similar Papers
No similar papers found.
D
Dan Schumacher
University of Texas at San Antonio
F
Fatemeh Haji
University of Texas at San Antonio
T
Tara Grey
University of Texas at San Antonio
N
Niharika Bandlamudi
University of Texas at San Antonio
N
Nupoor Karnik
University of Texas at San Antonio
G
Gagana Uday Kumar
University of Texas at San Antonio
J
J. Chiang
Peraton Labs
P
Paul Rad
University of Texas at San Antonio
Nishant Vishwamitra
Nishant Vishwamitra
Assistant Professor, Information Systems and Cyber Security
AIOnline AbuseCrisis Management and Crowdsourcing
Anthony Rios
Anthony Rios
Associate Professor in Information Systems and Cyber Security
Natural Language ProcessingBiomedical InformaticsComputational Social ScienceSocial Computing