HarnessForge: Joint Harness and Policy Evolution for Adaptive Agent Systems

📅 2026-06-01

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This work addresses the limitations of large language model agents in heterogeneous tasks, where fixed architectures hinder the joint optimization of external execution frameworks (harnesses) and internal reasoning policies. To overcome this, the authors propose HarnessForge, a novel framework that explicitly defines a stable adaptation space between harness and policy. By integrating fault-guided harness customization with harness-conditioned policy fine-tuning, HarnessForge enables their co-evolution. This approach breaks away from conventional methods that update only a single component, yielding substantial performance gains for Qwen3-4B and Qwen3-8B across five cross-domain benchmarks—surpassing the strongest baseline by up to 12.0%—while achieving superior trade-offs in reasoning efficiency.

📝 Abstract

LLM agents are increasingly expected to operate across heterogeneous task regimes that require distinct execution paradigms. This challenges fixed agent systems and motivates system-level meta-adaptation beyond isolated component updates. While existing works have adapted external harness or trained underlying reasoning policies, full-system adaptation remains insufficiently characterized. The adaptation space between structure and execution is rarely made explicit, and the compatibility between the external harness and the internal reasoner is not optimized jointly. We propose HarnessForge, a meta-adaptive framework for evolving LLM agent systems. HarnessForge formulates an agent system as a harness--policy pair, defining a stable adaptation space that separates harness-level execution structure from policy-level reasoning behavior. It then performs harness--policy co-evolution through fault-guided harness tailoring and harness-conditioned policy alignment. Experiments across five benchmarks from diverse domains show that HarnessForge consistently improves both Qwen3-4B and Qwen3-8B backbones, outperforming harness-only and policy-only baselines with gains of up to 12.0\% over the strongest baseline and achieving favorable rollout-efficiency tradeoffs, demonstrating that harness--policy co-evolution is effective, and that executable compatibility between the harness and reasoning policy is essential for agent-system adaptation. The code is available at https://github.com/mingju-c/HarnessForge.

Problem

Research questions and friction points this paper is trying to address.

LLM agents

heterogeneous tasks

system-level adaptation

harness-policy compatibility

meta-adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

harness-policy co-evolution

meta-adaptive framework

fault-guided harness tailoring