SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{circ}$ Law

📅 2025-07-24

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the critical challenge of co-evolving safety and intelligence in AI models. We propose SafeLadder, a framework for safety-intelligence co-evolution, and SafeWork-R1, a multimodal reasoning model. Methodologically, we introduce the novel “Safety Insight” mechanism and the AI-45° Law—a principled paradigm guiding joint capability-safety development. Our approach integrates multi-principle verifiers (logical, causal, normative), dual-path inference-time intervention (masking and re-ranking), and deliberative search to enable intrinsically safe reasoning and self-reflection. Experiments demonstrate a 46.54% average improvement on safety benchmarks—outperforming Qwen2.5-VL-72B, GPT-4.1, and Claude Opus 4—while preserving full general-purpose capability with zero degradation. Moreover, the framework exhibits strong cross-architecture generalization across InternVL3, DeepSeek, and Qwen2.5-VL. To our knowledge, this is the first empirical demonstration that safety and intelligence can co-evolve synergistically in large language and multimodal models.

Technology Category

Application Category

📝 Abstract

We introduce SafeWork-R1, a cutting-edge multimodal reasoning model that demonstrates the coevolution of capabilities and safety. It is developed by our proposed SafeLadder framework, which incorporates large-scale, progressive, safety-oriented reinforcement learning post-training, supported by a suite of multi-principled verifiers. Unlike previous alignment methods such as RLHF that simply learn human preferences, SafeLadder enables SafeWork-R1 to develop intrinsic safety reasoning and self-reflection abilities, giving rise to safety `aha' moments. Notably, SafeWork-R1 achieves an average improvement of $46.54%$ over its base model Qwen2.5-VL-72B on safety-related benchmarks without compromising general capabilities, and delivers state-of-the-art safety performance compared to leading proprietary models such as GPT-4.1 and Claude Opus 4. To further bolster its reliability, we implement two distinct inference-time intervention methods and a deliberative search mechanism, enforcing step-level verification. Finally, we further develop SafeWork-R1-InternVL3-78B, SafeWork-R1-DeepSeek-70B, and SafeWork-R1-Qwen2.5VL-7B. All resulting models demonstrate that safety and capability can co-evolve synergistically, highlighting the generalizability of our framework in building robust, reliable, and trustworthy general-purpose AI.

Problem

Research questions and friction points this paper is trying to address.

Develops SafeWork-R1 for coevolving AI safety and intelligence

Enhances safety reasoning without compromising general capabilities

Ensures reliability via inference-time interventions and verification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal reasoning model with coevolving safety

SafeLadder framework with progressive RL post-training

Inference-time intervention and deliberative search mechanism

🔎 Similar Papers

No similar papers found.

Authors to Follow