Benchmarking Large Language Models for Cryptanalysis and Mismatched-Generalization

📅 2025-05-30
📈 Citations: 0
✨ Influential: 0
📄 PDF
🤖 AI Summary
This study presents the first systematic evaluation of large language models’ (LLMs) capabilities and risks in cryptanalysis. Addressing three core challenges—decryption accuracy, semantic comprehension, and dual-use security concerns—the authors construct a comprehensive benchmark dataset comprising plaintext–ciphertext pairs spanning classical (e.g., Caesar, Vigenère) and modern (e.g., AES-simulated) ciphers, across diverse domains and stylistic variations. They propose the first LLM-specific cryptanalysis evaluation framework, integrating zero-shot and few-shot prompting, semantic consistency assessment, and adversarial jailbreaking tests. Results indicate that LLMs exhibit only limited, pattern-matching–based decryption capability against classical ciphers and are effectively ineffective against modern cryptographic primitives. Furthermore, the study reveals a critical generalization mismatch in side-channel communication scenarios and empirically confirms LLMs’ susceptibility to jailbreaking attacks—thereby underscoring significant dual-use risks in AI-enabled cryptanalysis.

Technology Category

Application Category

📝 Abstract
Recent advancements in Large Language Models (LLMs) have transformed natural language understanding and generation, leading to extensive benchmarking across diverse tasks. However, cryptanalysis a critical area for data security and encryption has not yet been thoroughly explored in LLM evaluations. To address this gap, we evaluate cryptanalytic potential of state of the art LLMs on encrypted texts generated using a range of cryptographic algorithms. We introduce a novel benchmark dataset comprising diverse plain texts spanning various domains, lengths, writing styles, and topics paired with their encrypted versions. Using zero-shot and few shot settings, we assess multiple LLMs for decryption accuracy and semantic comprehension across different encryption schemes. Our findings reveal key insights into the strengths and limitations of LLMs in side-channel communication while raising concerns about their susceptibility to jailbreaking attacks. This research highlights the dual-use nature of LLMs in security contexts and contributes to the ongoing discussion on AI safety and security.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' cryptanalytic potential on encrypted texts
Assessing decryption accuracy and semantic comprehension in LLMs
Exploring LLMs' dual-use nature in security contexts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates LLMs on diverse cryptographic algorithms
Introduces novel benchmark dataset for cryptanalysis
Assesses decryption accuracy in zero-shot settings
🔎 Similar Papers
No similar papers found.