COMPASS: Cognitive MCTS-Guided Process Alignment for Safe Search Agents

📅 2026-05-29

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This work addresses the risk that large language model–driven search agents may circumvent existing alignment mechanisms by decomposing harmful intents into seemingly innocuous subqueries during multi-step reasoning. To mitigate this, the authors propose a framework integrating Cognitive Tree Exploration (CTE) with Introspective Stepwise Alignment (ISA). Leveraging cognitive Monte Carlo tree search, the approach guides the synthesis of safe reasoning trajectories while enforcing risk isolation at intermediate steps, thereby enabling fine-grained safety supervision throughout the entire reasoning process. The method substantially reduces reliance on extensive training data and, without compromising task utility, significantly enhances the detection and suppression of diverse and sparsely occurring policy violations.

📝 Abstract

LLM-powered search agents enable multi-step reasoning and tool use. However, these capabilities introduce retrieval-induced safety degradation, as harmful intents may decompose into seemingly innocuous sub-queries that lead to unsafe outcomes. Existing alignment methods struggle to capture sparse safety signals and fail to supervise diverse violations across multi-step interactions. We propose COMPASS, a Cognitive MCTS-Guided Process Alignment framework designed to achieve robust safety alignment throughout the agent workflow while preserving general utility. COMPASS integrates cognitive tree exploration (CTE) to efficiently synthesize stealthy attack trajectories, and introspective step-wise alignment (ISA) to isolate risky intermediate actions for fine-grained process supervision. Empirical results show that COMPASS achieves a favorable safety-utility trade-off while requiring substantially less training data.

Problem

Research questions and friction points this paper is trying to address.

safety alignment

retrieval-induced safety degradation

multi-step reasoning

harmful intent decomposition

process supervision

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cognitive MCTS

Process Alignment

Safe Search Agents