🤖 AI Summary
Existing dialogue response generation research focuses on *what* to generate, neglecting the critical temporal decision problem of *when* to respond. This paper formally introduces “timely dialogue response generation” as a novel task for open-domain conversational agents. Methodologically, we construct TimelyChat—the first temporally enhanced evaluation benchmark—and a 55K-event-driven dialogue dataset; propose a time-aware dialogue generation paradigm; and design Timer, an end-to-end model that jointly models response content and response timing. Timer integrates temporal commonsense knowledge graph mining with large language model–based data synthesis to enable response interval prediction and time-aligned generation. Experiments demonstrate that Timer significantly outperforms prompt-engineered LLMs and diverse fine-tuned baselines in both turn-level and dialogue-level evaluations. All data, models, and code are publicly released.
📝 Abstract
While research on dialogue response generation has primarily focused on generating coherent responses conditioning on textual context, the critical question of when to respond grounded on the temporal context remains underexplored. To bridge this gap, we propose a novel task called timely dialogue response generation and introduce the TimelyChat benchmark, which evaluates the capabilities of language models to predict appropriate time intervals and generate time-conditioned responses. Additionally, we construct a large-scale training dataset by leveraging unlabeled event knowledge from a temporal commonsense knowledge graph and employing a large language model (LLM) to synthesize 55K event-driven dialogues. We then train Timer, a dialogue agent designed to proactively predict time intervals and generate timely responses that align with those intervals. Experimental results show that Timer outperforms prompting-based LLMs and other fine-tuned baselines in both turn-level and dialogue-level evaluations. We publicly release our data, model, and code.