VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use

📅 2025-08-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing Agentic Reinforcement Learning with Tool use (ARLT) approaches suffer from fragmented codebases, inefficient synchronous execution, and poor cross-domain generalization. To address these limitations, we propose ARLT-Engine—a unified, modular framework featuring a standardized multimodal tool API, asynchronous rollout and execution mechanisms, and plug-and-play support for diverse tools including code execution, web search, SQL querying, and vision processing. Furthermore, it adopts a verifiable-reward reinforcement learning paradigm to enhance training stability and alignment. Experiments across six benchmark tasks—mathematical reasoning, knowledge-based QA, SQL generation, visual reasoning, and more—demonstrate that ARLT-Engine matches or exceeds the performance of specialized systems while achieving nearly 2× higher training throughput. The framework significantly improves algorithm reusability and community extensibility. Both the open-source codebase and interactive platform are publicly released.

Technology Category

Application Category

📝 Abstract
Reinforcement Learning with Verifiable Rewards (RLVR) has demonstrated success in enhancing LLM reasoning capabilities, but remains limited to single-turn interactions without tool integration. While recent Agentic Reinforcement Learning with Tool use (ARLT) approaches have emerged to address multi-turn tool interactions, existing works develop task-specific codebases that suffer from fragmentation, synchronous execution bottlenecks, and limited extensibility across domains. These inefficiencies hinder broader community adoption and algorithmic innovation. We introduce VerlTool, a unified and modular framework that addresses these limitations through systematic design principles. VerlTool provides four key contributions: (1) upstream alignment with VeRL ensuring compatibility and simplified maintenance, (2) unified tool management via standardized APIs supporting diverse modalities including code execution, search, SQL databases, and vision processing, (3) asynchronous rollout execution achieving near 2$ imes$ speedup by eliminating synchronization bottlenecks, and (4) comprehensive evaluation demonstrating competitive performance across 6 ARLT domains. Our framework formalizes ARLT as multi-turn trajectories with multi-modal observation tokens (text/image/video), extending beyond single-turn RLVR paradigms. We train and evaluate models on mathematical reasoning, knowledge QA, SQL generation, visual reasoning, web search, and software engineering tasks, achieving results comparable to specialized systems while providing unified training infrastructure. The modular plugin architecture enables rapid tool integration requiring only lightweight Python definitions, significantly reducing development overhead and providing a scalable foundation for tool-augmented RL research. Our code is open-sourced at https://github.com/TIGER-AI-Lab/verl-tool.
Problem

Research questions and friction points this paper is trying to address.

Addressing fragmentation in agentic reinforcement learning tool use
Eliminating synchronous execution bottlenecks in multi-turn interactions
Enhancing cross-domain extensibility for tool-augmented RL systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified modular framework for tool integration
Asynchronous execution for 2x speedup
Standardized APIs supporting multi-modal tools
🔎 Similar Papers
No similar papers found.