🤖 AI Summary
Large language models (LLMs) exhibit limited capability in complex temporal reasoning—particularly involving multiple entities, composite temporal operations, and evolving event sequences—while existing time-aware knowledge graph (TKG)-based approaches struggle to jointly ensure temporal fidelity, multi-entity synchronization, operator adaptability, and experience reuse. Method: We propose a hierarchical temporal tree decomposition and operator-aware reasoning framework, integrating dynamic evidence retrieval, recursive structured grounding, and self-evolving experience memory to enable continual learning of reasoning trajectories and tool selection across tasks. Contribution/Results: Our method is the first to support joint temporal constraint modeling over multiple entities and guarantee temporal consistency throughout multi-hop reasoning. It achieves state-of-the-art performance on multiple temporal question answering benchmarks, outperforming strong baselines by up to 24.0%, and enables compact models (e.g., Qwen3-4B) to approach the performance of GPT-4-Turbo.
📝 Abstract
Large Language Models (LLMs) have achieved impressive reasoning abilities, but struggle with temporal understanding, especially when questions involve multiple entities, compound operators, and evolving event sequences. Temporal Knowledge Graphs (TKGs), which capture vast amounts of temporal facts in a structured format, offer a reliable source for temporal reasoning. However, existing TKG-based LLM reasoning methods still struggle with four major challenges: maintaining temporal faithfulness in multi-hop reasoning, achieving multi-entity temporal synchronization, adapting retrieval to diverse temporal operators, and reusing prior reasoning experience for stability and efficiency. To address these issues, we propose MemoTime, a memory-augmented temporal knowledge graph framework that enhances LLM reasoning through structured grounding, recursive reasoning, and continual experience learning. MemoTime decomposes complex temporal questions into a hierarchical Tree of Time, enabling operator-aware reasoning that enforces monotonic timestamps and co-constrains multiple entities under unified temporal bounds. A dynamic evidence retrieval layer adaptively selects operator-specific retrieval strategies, while a self-evolving experience memory stores verified reasoning traces, toolkit decisions, and sub-question embeddings for cross-type reuse. Comprehensive experiments on multiple temporal QA benchmarks show that MemoTime achieves overall state-of-the-art results, outperforming the strong baseline by up to 24.0%. Furthermore, MemoTime enables smaller models (e.g., Qwen3-4B) to achieve reasoning performance comparable to that of GPT-4-Turbo.