EChO-Agent: Evidence Chain Orchestration Agent for Audio Reasoning

📅 2026-06-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing language-audio models struggle to focus on relevant audio segments in complex audio-based question answering and lack interpretable, verifiable reasoning processes. This work proposes the first modular agent framework that supports the construction of audio evidence chains and self-verification, reframing the question-answering task as a collaborative pipeline involving planning, tool invocation, evidence integration, and answer validation. By integrating reinforcement learning, tool-augmented prompting, and a multi-stage evidence integration mechanism, the proposed approach significantly outperforms current baselines on the MMAR benchmark. Ablation studies confirm that evidence integration is the key driver behind the observed performance gains.
📝 Abstract
While LALMs show promise on audio question answering, they fail to focus on question-relevant segments of audio and provide a clear, checkable reasoning process when dealing with complex audio reasoning. Reinforcement learning and tool-augmented prompting can help models better relate questions to audio but lack a reliable way to understand, integrate, and self-verify audio segments. To address this gap, we present EChO-Agent, a modular agent framework that reformulates complex audio QA as a planning, tool execution, evidence integration, and answer verification workflow. Experiments on MMAR benchmark show EChO-Agent improves both accuracy and rubric scores over baseline and ablation studies show evidence integration is the key factor.
Problem

Research questions and friction points this paper is trying to address.

audio reasoning
question answering
evidence integration
reasoning process
audio segments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evidence Chain Orchestration
Audio Reasoning
Modular Agent Framework
Tool-Augmented Prompting
Self-Verification
🔎 Similar Papers
No similar papers found.