🤖 AI Summary
Large reasoning models (LRMs) demonstrate exceptional performance on complex tasks such as mathematical reasoning and code generation, yet they exhibit critical security vulnerabilities—including fragile reasoning chains, hallucination-induced errors, and goal hijacking—that impede safe real-world deployment. To address this gap, we propose the first fine-grained security taxonomy specifically designed for LRMs, extending beyond conventional large language model (LLM) safety frameworks. We systematically survey and unify multimodal analysis methodologies, including adversarial testing, formal verification, red-teaming, and interpretability-based diagnostics. Our work delivers a comprehensive security landscape encompassing 12 risk categories, 8 attack types, and 15 defense mechanisms. This unified framework establishes foundational principles for standardized LRM security evaluation, robust training protocols, and trustworthy deployment—bridging theoretical rigor with practical applicability in safety-critical AI systems.
📝 Abstract
Large Reasoning Models (LRMs) have exhibited extraordinary prowess in tasks like mathematics and coding, leveraging their advanced reasoning capabilities. Nevertheless, as these capabilities progress, significant concerns regarding their vulnerabilities and safety have arisen, which can pose challenges to their deployment and application in real-world settings. This paper presents a comprehensive survey of LRMs, meticulously exploring and summarizing the newly emerged safety risks, attacks, and defense strategies. By organizing these elements into a detailed taxonomy, this work aims to offer a clear and structured understanding of the current safety landscape of LRMs, facilitating future research and development to enhance the security and reliability of these powerful models.