Output Length Effect on DeepSeek-R1's Safety in Forced Thinking

📅 2025-03-02

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This study identifies a non-monotonic impact of output length on the adversarial robustness of DeepSeek-R1 under forced-reasoning settings: excessively short outputs impair defensive capability, while overly long ones enable malicious prompt expansion. To address this, we propose a reinforcement learning–based paradigm for dynamic token-length control, jointly optimizing reasoning accuracy and adversarial robustness. Our method integrates adversarial prompt engineering, quantitative analysis of response robustness, and policy-adaptive length regulation. Evaluated across diverse jailbreaking and induction attacks, it improves safety rate by 12.7% while preserving 98.3% of the original reasoning accuracy. Crucially, this work is the first to systematically uncover and model the dual-role security effect of LLM output length—where length simultaneously influences both vulnerability and defense efficacy. By establishing an interpretable, deployable framework for controllable generation and safety alignment, our approach advances principled methods for robust, trustworthy LLM deployment.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have demonstrated strong reasoning capabilities, but their safety under adversarial conditions remains a challenge. This study examines the impact of output length on the robustness of DeepSeek-R1, particularly in Forced Thinking scenarios. We analyze responses across various adversarial prompts and find that while longer outputs can improve safety through self-correction, certain attack types exploit extended generations. Our findings suggest that output length should be dynamically controlled to balance reasoning effectiveness and security. We propose reinforcement learning-based policy adjustments and adaptive token length regulation to enhance LLM safety.

Problem

Research questions and friction points this paper is trying to address.

Impact of output length on DeepSeek-R1's safety in adversarial conditions.

Longer outputs improve safety but are vulnerable to specific attack types.

Dynamic control of output length balances reasoning effectiveness and security.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic control of output length for safety

Reinforcement learning for policy adjustments

Adaptive token length regulation in LLMs

🔎 Similar Papers

Large Language Models Are Involuntary Truth-Tellers: Exploiting Fallacy Failure for Jailbreak Attacks