A Modular Multitask Reasoning Framework Integrating Spatio-temporal Models and LLMs

📅 2025-06-24

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

Existing spatiotemporal data mining models suffer from poor task generalization, limited complex reasoning capabilities, and insufficient interpretability—hindering multi-level decision support. To address these limitations, we propose STReason, the first framework that seamlessly integrates large language models (LLMs) with domain-specific spatiotemporal models via in-context learning, enabling automatic decomposition and execution of natural language queries into modular programs—without fine-tuning—thus supporting multi-task, long-context spatiotemporal reasoning. Our key contributions include: (1) a human-interpretable, modular reasoning pipeline; and (2) the first benchmark and evaluation framework specifically designed for long-text spatiotemporal reasoning. Experiments demonstrate that STReason significantly outperforms state-of-the-art LLM-based baselines on the new benchmark, especially on complex reasoning tasks. Human evaluation further confirms its high credibility and practical utility, substantially reducing expert effort.

Technology Category

Application Category

📝 Abstract

Spatio-temporal data mining plays a pivotal role in informed decision making across diverse domains. However, existing models are often restricted to narrow tasks, lacking the capacity for multi-task inference and complex long-form reasoning that require generation of in-depth, explanatory outputs. These limitations restrict their applicability to real-world, multi-faceted decision scenarios. In this work, we introduce STReason, a novel framework that integrates the reasoning strengths of large language models (LLMs) with the analytical capabilities of spatio-temporal models for multi-task inference and execution. Without requiring task-specific finetuning, STReason leverages in-context learning to decompose complex natural language queries into modular, interpretable programs, which are then systematically executed to generate both solutions and detailed rationales. To facilitate rigorous evaluation, we construct a new benchmark dataset and propose a unified evaluation framework with metrics specifically designed for long-form spatio-temporal reasoning. Experimental results show that STReason significantly outperforms advanced LLM baselines across all metrics, particularly excelling in complex, reasoning-intensive spatio-temporal scenarios. Human evaluations further validate STReason's credibility and practical utility, demonstrating its potential to reduce expert workload and broaden the applicability to real-world spatio-temporal tasks. We believe STReason provides a promising direction for developing more capable and generalizable spatio-temporal reasoning systems.

Problem

Research questions and friction points this paper is trying to address.

Integrates LLMs and spatio-temporal models for multi-task inference

Enables complex natural language queries decomposition without task-specific finetuning

Improves performance in reasoning-intensive spatio-temporal scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates LLMs with spatio-temporal models

Uses in-context learning for modular programs

Benchmark with long-form reasoning metrics

🔎 Similar Papers

Semantic Self-Consistency: Enhancing Language Model Reasoning via Semantic Weighting