Safety in Large Reasoning Models: A Survey

📅 2025-04-24

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Large reasoning models (LRMs) demonstrate exceptional performance on complex tasks such as mathematical reasoning and code generation, yet they exhibit critical security vulnerabilities—including fragile reasoning chains, hallucination-induced errors, and goal hijacking—that impede safe real-world deployment. To address this gap, we propose the first fine-grained security taxonomy specifically designed for LRMs, extending beyond conventional large language model (LLM) safety frameworks. We systematically survey and unify multimodal analysis methodologies, including adversarial testing, formal verification, red-teaming, and interpretability-based diagnostics. Our work delivers a comprehensive security landscape encompassing 12 risk categories, 8 attack types, and 15 defense mechanisms. This unified framework establishes foundational principles for standardized LRM security evaluation, robust training protocols, and trustworthy deployment—bridging theoretical rigor with practical applicability in safety-critical AI systems.

Technology Category

Application Category

📝 Abstract

Large Reasoning Models (LRMs) have exhibited extraordinary prowess in tasks like mathematics and coding, leveraging their advanced reasoning capabilities. Nevertheless, as these capabilities progress, significant concerns regarding their vulnerabilities and safety have arisen, which can pose challenges to their deployment and application in real-world settings. This paper presents a comprehensive survey of LRMs, meticulously exploring and summarizing the newly emerged safety risks, attacks, and defense strategies. By organizing these elements into a detailed taxonomy, this work aims to offer a clear and structured understanding of the current safety landscape of LRMs, facilitating future research and development to enhance the security and reliability of these powerful models.

Problem

Research questions and friction points this paper is trying to address.

Identify safety risks in Large Reasoning Models

Explore attacks targeting LRM vulnerabilities

Summarize defense strategies for LRM security

Innovation

Methods, ideas, or system contributions that make the work stand out.

Surveying safety risks in Large Reasoning Models

Taxonomy for attacks and defense strategies

Enhancing security and reliability of LRMs

🔎 Similar Papers

No similar papers found.

Authors to Follow