🤖 AI Summary
To address the inefficiency and unverifiability of answers caused by combinatorial explosion in multi-hop knowledge graph question answering (KGQA), this paper proposes a hybrid reasoning framework that decouples *planning* from *execution*. Specifically, a single LLM call generates a relational path plan, which is then executed via traceable, symbolic breadth-first search. A lightweight embedding-guided edge-scoring model—trained via knowledge distillation and compressed to 4B parameters—replaces costly LLM-based path ranking. By integrating symbolic logic with graph- and text-based embeddings, the method achieves micro-F1 > 0.90 on MetaQA, accelerates inference over 100× compared to standard LLM-based approaches, and matches large-model performance without API costs. This design significantly enhances answer verifiability, computational efficiency, and deployment feasibility.
📝 Abstract
Multi-hop question answering over knowledge graphs remains computationally challenging due to the combinatorial explosion of possible reasoning paths. Recent approaches rely on expensive Large Language Model (LLM) inference for both entity linking and path ranking, limiting their practical deployment. Additionally, LLM-generated answers often lack verifiable grounding in structured knowledge. We present two complementary hybrid algorithms that address both efficiency and verifiability: (1) LLM-Guided Planning that uses a single LLM call to predict relation sequences executed via breadth-first search, achieving near-perfect accuracy (micro-F1 > 0.90) while ensuring all answers are grounded in the knowledge graph, and (2) Embedding-Guided Neural Search that eliminates LLM calls entirely by fusing text and graph embeddings through a lightweight 6.7M-parameter edge scorer, achieving over 100 times speedup with competitive accuracy. Through knowledge distillation, we compress planning capability into a 4B-parameter model that matches large-model performance at zero API cost. Evaluation on MetaQA demonstrates that grounded reasoning consistently outperforms ungrounded generation, with structured planning proving more transferable than direct answer generation. Our results show that verifiable multi-hop reasoning does not require massive models at inference time, but rather the right architectural inductive biases combining symbolic structure with learned representations.