ANNEAL: Adapting LLM Agents via Governed Symbolic Patch Learning

📅 2026-05-04

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the persistent failure of large language model (LLM) agents in repetitive tasks due to deficiencies in procedural knowledge—such as operators, preconditions, and constraints—and the inability of existing approaches to safely and durably correct their symbolic structures. The paper proposes ANNEAL, a neuro-symbolic agent that employs a failure-driven knowledge acquisition mechanism to transform recurrent errors into controlled, symbolic repairs of a procedural knowledge graph, without altering the underlying LLM weights. ANNEAL is the first method to enable traceable, rollback-capable, and governance-compliant persistent fixes to task-dependent symbolic structures, overcoming the limitations of prompt engineering or weight-based adaptation alone. Evaluated across four domains with 27 multi-seed experiments, ANNEAL reduces the retained failure rate from 72–100% to 0%, substantially outperforming baselines such as ReAct and Reflexion.

📝 Abstract

LLM-based agents can recover from individual execution errors, yet they repeatedly fail on the same fault when the underlying process knowledge--operator schemas, preconditions, and constraints--remains unrepaired. Existing self-evolving approaches address this gap by updating prompts, memory, or model weights, but none directly repair the symbolic structures that encode how tasks are executed, and few provide the governance guarantees required for safe deployment. We introduce ANNEAL, a neuro-symbolic agent that converts recurring failures into governed symbolic edits of a process knowledge graph without modifying foundation model weights. Its core mechanism, Failure-Driven Knowledge Acquisition (FDKA), localizes the responsible operator, synthesizes a typed patch through constrained LLM generation, and validates the proposal via multi-dimensional scoring, symbolic guardrails, and canary testing before commit. Every accepted edit carries full provenance and deterministic rollback capability. Across four domains and 27 multi-seed runs, ANNEAL is the only evaluated system that commits persistent structural repairs--strong baselines such as ReAct and Reflexion achieve high episodic recovery yet retain 72-100% holdout failure rates on recurring faults, whereas ANNEAL reduces these to 0% in the tested recurring-failure settings. Ablation confirms that removing FDKA eliminates all structural repairs and drops success rate by up to 26.7 percentage points. These results suggest that governed symbolic repair offers a complementary paradigm to weight-level and prompt-level adaptation for persistent fault elimination.

Problem

Research questions and friction points this paper is trying to address.

LLM agents

symbolic repair

process knowledge

recurring failures

governance

Innovation

Methods, ideas, or system contributions that make the work stand out.

neuro-symbolic agent

symbolic patch learning

Failure-Driven Knowledge Acquisition