NeuroLog: Reasoning You Can Audit -- Neuro-Symbolic Vulnerability Discovery via LLM Facts, Datalog, and SMT

📅 2026-05-30

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This work proposes NeuroLog, the first end-to-end vulnerability discovery framework that operates without requiring a build environment. Addressing the limitations of traditional static analysis—which depends on complete build setups—and large language models (LLMs)—which struggle with precise interprocedural data-flow tracking—NeuroLog leverages LLMs to extract type-aware data-flow facts function by function. These facts are then composed into cross-function paths using Datalog rules. Infeasible paths are pruned via an SMT solver, which also generates a SAT model used by the LLM to synthesize verifiable crash-inducing inputs. Evaluated on multiple open-source projects, NeuroLog reproduces eight known CVEs, including CVE-2023-38545 (CVSS 9.8), and discovers five new memory-safety vulnerabilities in libarchive—four previously unreported—all confirmed by AddressSanitizer. Each analysis completes in 37 seconds at a cost of approximately $0.005.

📝 Abstract

Vulnerability discovery on C/C++ source asks the analyst to choose between heavyweight static analysers, which need a working build before a single query runs, and free-form LLMs, which read source readily but invent details and lose track of cross-function dataflow on real codebases. We present NeuroLog, an end-to-end build-free pipeline that assigns each layer the role it is uniquely good at: an LLM extracts typed dataflow facts one function at a time; a Souffle rule mesh composes those facts into cross-function findings; a Z3 post-pass filters infeasible findings and emits a SAT model for each survivor. To go beyond pure static reasoning we also fold in runtime evidence: likely range invariants from a handful of corpus seeds tighten the SMT problem at near-zero cost. A second LLM agent reads each SAT model and writes a Python program that produces a candidate crashing input, validated by an AddressSanitizer harness. Combining static-narrowing-SMT (Saturn, Pinpoint) and Datalog-with-SMT (Formulog) is prior art; new here are an LLM-derived fact base, a no-build pipeline, and the SAT model as an artifact (input to crash synthesis) rather than a yes/no verdict. Across stb, cJSON, libxml2, an FFmpeg demuxer slice, and curl 8.3.0, NeuroLog re-discovers eight CVE-class issues end-to-end, including the CVSS-9.8 SOCKS5 heap overflow CVE-2023-38545, each ASan-confirmed. On libarchive HEAD we surface five memory-safety bugs (four previously unreported) across the cpio reader and the XAR/WARC/7zip writers; all filed upstream, several fixes merged, with the cpio use-after-free acknowledged in seven hours. Extraction takes ~37 s and $0.005 on stb; crash synthesis turned a static finding into a 102-byte stb_vorbis crash in two LLM iterations (no fuzzer); a likely-invariant filter from three Matroska seeds eliminates 13.2% of the FFmpeg-demuxer feasible set.

Problem

Research questions and friction points this paper is trying to address.

vulnerability discovery

C/C++ source code

static analysis

large language models

cross-function dataflow

Innovation

Methods, ideas, or system contributions that make the work stand out.

Neuro-Symbolic

LLM Fact Extraction

Build-Free Analysis