🤖 AI Summary
Large language models (LLMs) excel at natural language understanding but exhibit limitations in explicit commonsense reasoning—particularly in story-based question answering—due to their lack of structured, interpretable inference mechanisms. Method: We propose LLM2LAS, the first framework that automatically induces symbolic logic rules from question-answer examples without manual design of reasoning structures. It leverages an LLM to extract semantic structures from stories, feeds them into the Inductive Logic Programming system ILASP to learn answer-set rules, and performs formal reasoning via Answer Set Programming (ASP). Contribution/Results: This neuro-symbolic integration combines neural comprehension with symbolic interpretability and compositional scalability. On standard story QA benchmarks, LLM2LAS significantly improves reasoning consistency and out-of-distribution generalization, accurately answering unseen questions. Empirical results validate that automated rule induction effectively enhances LLMs’ commonsense reasoning capabilities.
📝 Abstract
Large Language Models (LLMs) excel at understanding natural language but struggle with explicit commonsense reasoning. A recent trend of research suggests that the combination of LLM with robust symbolic reasoning systems can overcome this problem on story-based question answering tasks. In this setting, existing approaches typically depend on human expertise to manually craft the symbolic component. We argue, however, that this component can also be automatically learned from examples. In this work, we introduce LLM2LAS, a hybrid system that effectively combines the natural language understanding capabilities of LLMs, the rule induction power of the Learning from Answer Sets (LAS) system ILASP, and the formal reasoning strengths of Answer Set Programming (ASP). LLMs are used to extract semantic structures from text, which ILASP then transforms into interpretable logic rules. These rules allow an ASP solver to perform precise and consistent reasoning, enabling correct answers to previously unseen questions. Empirical results outline the strengths and weaknesses of our automatic approach for learning and reasoning in a story-based question answering benchmark.