Exploring Agentic Tool-Calling Decisions via Uncertainty-Aligned Reinforcement Learning

📅 2026-06-05

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of error accumulation in large language model agents during tool use, often caused by hallucinations or invocation of unsupported tools. To mitigate this, the paper introduces an uncertainty-aware alignment mechanism for optimizing tool-calling decisions—a novel approach that quantifies action uncertainty and incorporates a repulsive force into the reinforcement learning reward to separate correct from erroneous actions. Combined with lightweight annotations on critical decision turns, the method enables unified post-training over multi-turn trajectories. This strategy enhances exploration signals, alleviates overconfident errors, and significantly improves both decision quality and agent performance across multiple tool-use benchmarks, while preserving well-calibrated uncertainty estimates.

📝 Abstract

Large language model (LLM)-based agents often make suboptimal tool-use decisions, including unsupported tool invocation and hallucinated direct responses, which may accumulate errors throughout multi-step interactions. Existing approaches mainly improve these behaviors through inference-time correction or coarse-grained reward signals based on decision outcomes and structured checklists, leaving the uncertainty characteristics of agent decisions underexplored. We observe that decision-oriented reinforcement learning tends to weaken the uncertainty separation between correct and incorrect actions, resulting in overconfident mistakes and weaker exploration signals. Therefore, we propose TRUST, which incorporates uncertainty quantification into reward design as a repulsive force for maintaining uncertainty separation, and labels lightweight key-turn annotations for unified post-training of multi-turn trajectories. Experimental results across diverse tool-use benchmarks show that TRUST consistently enhances both decision quality and agent performance while maintaining more reliable uncertainty estimates during optimization.

Problem

Research questions and friction points this paper is trying to address.

tool-use decisions

uncertainty quantification

reinforcement learning

large language models

decision reliability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uncertainty Quantification

Reinforcement Learning

Tool-Use Decision