Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward

📅 2025-08-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address three key bottlenecks in LLM-based agents for deep-reasoning tasks—static knowledge, rigid RAG pipelines, and gradient conflict/sparsity in reinforcement learning—this paper proposes the Atomic Thinking Paradigm (ATP). ATP decomposes reasoning into fine-grained, supervisable atomic thinking units, tightly integrating retrieval-augmented generation with autonomous agent mechanisms to enable multi-hop reasoning and strategic search. We introduce a novel Reasoning Reward Model (RRM) and Atomic Thinking Reward (ATR), coupled with a curriculum-learning-driven reward scheduling strategy, enabling process-level supervision and fine-grained optimization. Our approach significantly improves training efficiency, inference interpretability, and test-time compute scalability. It achieves state-of-the-art performance across seven benchmark tasks, demonstrating superior multi-step reasoning capability, enhanced convergence stability, and more human-like reasoning behavior.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) exhibit remarkable problem-solving abilities, but struggle with complex tasks due to static internal knowledge. Retrieval-Augmented Generation (RAG) enhances access to external information, yet remains limited in multi-hop reasoning and strategic search due to rigid workflows. Recent advancements in agentic deep research empower LLMs to autonomously reason, search, and synthesize information. However, current approaches relying on outcome-based reinforcement learning (RL) face critical issues such as conflicting gradients and reward sparsity, limiting performance gains and training efficiency. To address these, we first propose Atomic Thought, a novel LLM thinking paradigm that decomposes reasoning into fine-grained functional units. These units are supervised by Reasoning Reward Models (RRMs), which provide Atomic Thought Rewards (ATR) for fine-grained guidance. Building on this, we propose Atom-Searcher, a novel RL framework for agentic deep research that integrates Atomic Thought and ATR. Atom-Searcher uses a curriculum-inspired reward schedule, prioritizing process-level ATR early and transitioning to outcome rewards, accelerating convergence on effective reasoning paths. Experiments on seven benchmarks show consistent improvements over the state-of-the-art. Key advantages include: (1) Atom-Searcher scales computation at test-time. (2) Atomic Thought provides supervision anchors for RRMs, bridging deep research tasks and RRMs. (3) Atom-Searcher exhibits more interpretable, human-like reasoning patterns.
Problem

Research questions and friction points this paper is trying to address.

Enhances multi-hop reasoning in agentic deep research
Addresses reward sparsity in outcome-based reinforcement learning
Improves interpretability of reasoning patterns in LLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Atomic Thought decomposes reasoning into fine-grained units
Reasoning Reward Models supervise with Atomic Thought Rewards
Curriculum-inspired reward schedule optimizes training efficiency
Y
Yong Deng
Ant Group
G
Guoqing Wang
Ant Group
Z
Zhenzhe Ying
Ant Group
X
Xiaofeng Wu
Ant Group
J
Jinzhen Lin
Ant Group
W
Wenwen Xiong
Ant Group
Yuqin Dai
Yuqin Dai
Tsinghua University
LLMAI4ScienceAvatarGenerative Model
S
Shuo Yang
Ant Group
Zhanwei Zhang
Zhanwei Zhang
State Key Lab of CAD&CG, College of Computer Science, Zhejiang University
Large Language ModelComputer Vision
Q
Qiwen Wang
Ant Group
Yang Qin
Yang Qin
Ant Group
C
Changhua Meng
Ant Group