CCFC: Core & Core-Full-Core Dual-Track Defense for LLM Jailbreak Protection

📅 2025-08-19

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

Large language models (LLMs) remain vulnerable to prompt injection and structure-aware jailbreaking attacks, which exploit semantic ambiguity and syntactic manipulation to bypass safety constraints. Method: This paper proposes a dual-track prompt-level defense framework featuring two novel mechanisms: (i) a “core-only” track that extracts semantic essentials via few-shot prompting to suppress adversarial noise, and (ii) a “core-full-core” track that performs structure-aware safety consistency verification to detect and block malicious structural patterns. Contribution/Results: The framework jointly optimizes security and robustness without degrading response quality. Extensive experiments demonstrate that, compared to state-of-the-art defenses, our approach reduces jailbreak success rates by 50–75% under strong adversarial attacks while preserving high-fidelity, accurate outputs on benign queries—thereby significantly enhancing the practical deployment security of LLMs.

Technology Category

Application Category

📝 Abstract

Jailbreak attacks pose a serious challenge to the safe deployment of large language models (LLMs). We introduce CCFC (Core & Core-Full-Core), a dual-track, prompt-level defense framework designed to mitigate LLMs' vulnerabilities from prompt injection and structure-aware jailbreak attacks. CCFC operates by first isolating the semantic core of a user query via few-shot prompting, and then evaluating the query using two complementary tracks: a core-only track to ignore adversarial distractions (e.g., toxic suffixes or prefix injections), and a core-full-core (CFC) track to disrupt the structural patterns exploited by gradient-based or edit-based attacks. The final response is selected based on a safety consistency check across both tracks, ensuring robustness without compromising on response quality. We demonstrate that CCFC cuts attack success rates by 50-75% versus state-of-the-art defenses against strong adversaries (e.g., DeepInception, GCG), without sacrificing fidelity on benign queries. Our method consistently outperforms state-of-the-art prompt-level defenses, offering a practical and effective solution for safer LLM deployment.

Problem

Research questions and friction points this paper is trying to address.

Defending LLMs against jailbreak attacks and prompt injection

Mitigating vulnerabilities from structure-aware adversarial attacks

Reducing attack success rates without compromising response quality

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-track defense framework for LLM protection

Isolates semantic core via few-shot prompting

Safety consistency check across complementary tracks

🔎 Similar Papers

SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner