🤖 AI Summary
This work investigates the fundamental limits of large language models (LLMs) in logical rule induction, particularly their inability to reliably track long-range dependencies in predicate chains. Method: We propose the first expressivity hierarchy framework grounded in rule dependency structure complexity—moving beyond conventional theoretical complexity measures—and design a closed-loop interactive architecture that tightly couples LLMs with formal reasoning engines, integrating both standard inductive logic programming (ILP) benchmarks and a novel, controllable logical test suite. Contribution/Results: Experiments show that state-of-the-art LLMs achieve performance on par with top ILP systems across multiple logical induction tasks; however, accuracy degrades markedly on long-chain predicate reasoning, empirically validating our framework’s capacity to precisely identify and quantify this intrinsic bottleneck.
📝 Abstract
This work presents a novel systematic methodology to analyse the capabilities and limitations of Large Language Models (LLMs) with feedback from a formal inference engine, on logic theory induction. The analysis is complexity-graded w.r.t. rule dependency structure, allowing quantification of specific inference challenges on LLM performance. Integrating LLMs with formal methods is a promising frontier in the Natural Language Processing field, as an important avenue for improving model inference control and explainability. In particular, inductive learning over complex sets of facts and rules, poses unique challenges for current autoregressive models, as they lack explicit symbolic grounding. While they can be complemented by formal systems, the properties delivered by LLMs regarding inductive learning, are not well understood and quantified. Empirical results indicate that the largest LLMs can achieve competitive results against a SOTA Inductive Logic Programming (ILP) system baseline, but also that tracking long predicate relationship chains is a more difficult obstacle than theory complexity for LLMs.