🤖 AI Summary
This work addresses the limitation of current large language models (LLMs) in flowchart-guided dialogue, where the absence of explicit topological reasoning often leads to logical inconsistencies or hallucinations. To overcome this, we propose FloCA, a zero-shot flowchart dialogue agent that decouples intent understanding and response generation—handled by an LLM—from graph-structured reasoning, which is delegated to an external graph reasoning engine constrained by the flowchart’s topology. This architecture ensures faithful, logically consistent multi-turn interactions. We introduce a comprehensive evaluation framework featuring an LLM-based user simulator and five novel metrics, and demonstrate through experiments on the FLODIAL and PFDial datasets that FloCA significantly outperforms existing methods, achieving superior reasoning accuracy and interaction efficiency.
📝 Abstract
Flowchart-oriented dialogue (FOD) systems aim to guide users through multi-turn decision-making or operational procedures by following a domain-specific flowchart to achieve a task goal. In this work, we formalize flowchart reasoning in FOD as grounding user input to flowchart nodes at each dialogue turn while ensuring node transition is consistent with the correct flowchart path. Despite recent advances of LLMs in task-oriented dialogue systems, adapting them to FOD still faces two limitations: (1) LLMs lack an explicit mechanism to represent and reason over flowchart topology, and (2) they are prone to hallucinations, leading to unfaithful flowchart reasoning. To address these limitations, we propose FloCA, a zero-shot flowchart-oriented conversational agent. FloCA uses an LLM for intent understanding and response generation while delegating flowchart reasoning to an external tool that performs topology-constrained graph execution, ensuring faithful and logically consistent node transitions across dialogue turns. We further introduce an evaluation framework with an LLM-based user simulator and five new metrics covering reasoning accuracy and interaction efficiency. Extensive experiments on FLODIAL and PFDial datasets highlight the bottlenecks of existing LLM-based methods and demonstrate the superiority of FloCA. Our codes are available at https://github.com/Jinzi-Zou/FloCA-flowchart-reasoning.