🤖 AI Summary
To address performance degradation of large language models (LLMs) in large-scale firmware vulnerability detection—caused by firmware binary heterogeneity and complex cross-file dependencies—this paper proposes FIRMHIVE, the first large-scale firmware security analysis framework based on autonomous agent swarms. Its core contributions are: (1) modeling “delegation” as executable primitives for agents; (2) constructing a runtime Agent Tree (ToA) to enable decentralized, dynamically scalable multi-agent coordination; and (3) integrating recursive task decomposition with firmware-level cross-file dependency reasoning. Evaluated on real-world firmware images, FIRMHIVE detects 1.5× more vulnerabilities than state-of-the-art tools (1,802 total), increases actionable alerts by 5.6×, and achieves 71% precision.
📝 Abstract
Large Language Models (LLMs) and their agent systems have recently demonstrated strong potential in automating code reasoning and vulnerability detection. However, when applied to large-scale firmware, their performance degrades due to the binary nature of firmware, complex dependency structures, and heterogeneous components. To address this challenge, this paper presents FIRMHIVE, a recursive agent hive that enables LLMs to act as autonomous firmware security analysts. FIRMHIVE introduces two key mechanisms: (1) transforming delegation into a per-agent, executable primitive and (2) constructing a runtime Tree of Agents (ToA) for decentralized coordination. We evaluate FIRMHIVE using real-world firmware images obtained from publicly available datasets, covering five representative security analysis tasks. Compared with existing LLM-agent baselines, FIRMHIVE performs deeper (about 16x more reasoning steps) and broader (about 2.3x more files inspected) cross-file exploration, resulting in about 5.6x more alerts per firmware. Compared to state-of-the-art (SOTA) security tools, FIRMHIVE identifies about 1.5x more vulnerabilities (1,802 total) and achieves 71% precision, representing significant improvements in both yield and fidelity.