🤖 AI Summary
Current question-answering (QA) systems face two critical challenges: factual hallucination—i.e., generating answers inconsistent with real-world or domain-specific knowledge—and insufficient capability for compositional reasoning. To address these, we propose Chain-of-Action (CoA), a novel framework for multimodal and retrieval-augmented QA. CoA introduces the first *reasoning-retrieval co-execution* mechanism, enabling plug-and-play domain-adaptive actions. It incorporates Multi-Reference Faithfulness Scoring (MRFS) to automatically detect and resolve answer conflicts across heterogeneous sources. Furthermore, CoA integrates systematic prompt engineering with three distinct heterogeneous retrieval actions—textual, visual, and structured—to enhance answer faithfulness and reasoning robustness. Evaluated on public benchmarks and real-world Web3 applications, CoA achieves significant improvements over state-of-the-art methods, notably boosting both compositional reasoning accuracy and factual consistency.
📝 Abstract
We present a Chain-of-Action (CoA) framework for multimodal and retrieval-augmented Question-Answering (QA). Compared to the literature, CoA overcomes two major challenges of current QA applications: (i) unfaithful hallucination that is inconsistent with real-time or domain facts and (ii) weak reasoning performance over compositional information. Our key contribution is a novel reasoning-retrieval mechanism that decomposes a complex question into a reasoning chain via systematic prompting and pre-designed actions. Methodologically, we propose three types of domain-adaptable `Plug-and-Play' actions for retrieving real-time information from heterogeneous sources. We also propose a multi-reference faith score (MRFS) to verify and resolve conflicts in the answers. Empirically, we exploit both public benchmarks and a Web3 case study to demonstrate the capability of CoA over other methods.