Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team

📅 2025-06-17

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Current large language models (LLMs) lack mechanisms for accumulating experience and co-evolving reasoning capabilities, hindering continual learning in complex reasoning tasks. Method: We propose a training-free multi-agent reasoning framework that constructs an evolvable holistic experience memory using black-box LLMs, emulating Olympiad-team-style collaborative learning—integrating external/self-retrieval, tool invocation, peer collaboration, autonomous evaluation, and iterative refinement. We introduce the first “inference-time zero-training” multimodal experience integration mechanism, enabling cross-problem abstraction transfer and strategy distillation. Contribution/Results: Our approach achieves new state-of-the-art results on GSM8K (98.1%), AIME’24 (94.4%), and Math-500 (99.8%). Notably, the lightweight QWQ-32B outperforms Qwen3-235B, demonstrating a paradigm shift from isolated inference to experience-aware agentic reasoning.

Technology Category

Application Category

📝 Abstract

Despite impressive progress on complex reasoning, current large language models (LLMs) typically operate in isolation - treating each problem as an independent attempt, without accumulating or integrating experiential knowledge. In contrast, expert problem solvers - such as Olympiad or programming contest teams - leverage a rich tapestry of experiences: absorbing mentorship from coaches, developing intuition from past problems, leveraging knowledge of tool usage and library functionality, adapting strategies based on the expertise and experiences of peers, continuously refining their reasoning through trial and error, and learning from other related problems even during competition. We introduce Xolver, a training-free multi-agent reasoning framework that equips a black-box LLM with a persistent, evolving memory of holistic experience. Xolver integrates diverse experience modalities, including external and self-retrieval, tool use, collaborative interactions, agent-driven evaluation, and iterative refinement. By learning from relevant strategies, code fragments, and abstract reasoning patterns at inference time, Xolver avoids generating solutions from scratch - marking a transition from isolated inference toward experience-aware language agents. Built on both open-weight and proprietary models, Xolver consistently outperforms specialized reasoning agents. Even with lightweight backbones (e.g., QWQ-32B), it often surpasses advanced models including Qwen3-235B, Gemini 2.5 Pro, o3, and o4-mini-high. With o3-mini-high, it achieves new best results on GSM8K (98.1%), AIME'24 (94.4%), AIME'25 (93.7%), Math-500 (99.8%), and LiveCodeBench-V5 (91.6%) - highlighting holistic experience learning as a key step toward generalist agents capable of expert-level reasoning. Code and data are available at https://kagnlp.github.io/xolver.github.io/.

Problem

Research questions and friction points this paper is trying to address.

Enables LLMs to learn from holistic experiences like expert teams

Integrates diverse experience modalities for multi-agent reasoning

Achieves superior performance without training, using evolving memory

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent framework with evolving memory

Integrates diverse experience modalities

Learns from strategies at inference time

🔎 Similar Papers

No similar papers found.

Authors to Follow