đ¤ AI Summary
This study presents the first systematic evaluation of large language modelsâ (LLMs) capabilities and risks in cryptanalysis. Addressing three core challengesâdecryption accuracy, semantic comprehension, and dual-use security concernsâthe authors construct a comprehensive benchmark dataset comprising plaintextâciphertext pairs spanning classical (e.g., Caesar, Vigenère) and modern (e.g., AES-simulated) ciphers, across diverse domains and stylistic variations. They propose the first LLM-specific cryptanalysis evaluation framework, integrating zero-shot and few-shot prompting, semantic consistency assessment, and adversarial jailbreaking tests. Results indicate that LLMs exhibit only limited, pattern-matchingâbased decryption capability against classical ciphers and are effectively ineffective against modern cryptographic primitives. Furthermore, the study reveals a critical generalization mismatch in side-channel communication scenarios and empirically confirms LLMsâ susceptibility to jailbreaking attacksâthereby underscoring significant dual-use risks in AI-enabled cryptanalysis.
đ Abstract
Recent advancements in Large Language Models (LLMs) have transformed natural language understanding and generation, leading to extensive benchmarking across diverse tasks. However, cryptanalysis a critical area for data security and encryption has not yet been thoroughly explored in LLM evaluations. To address this gap, we evaluate cryptanalytic potential of state of the art LLMs on encrypted texts generated using a range of cryptographic algorithms. We introduce a novel benchmark dataset comprising diverse plain texts spanning various domains, lengths, writing styles, and topics paired with their encrypted versions. Using zero-shot and few shot settings, we assess multiple LLMs for decryption accuracy and semantic comprehension across different encryption schemes. Our findings reveal key insights into the strengths and limitations of LLMs in side-channel communication while raising concerns about their susceptibility to jailbreaking attacks. This research highlights the dual-use nature of LLMs in security contexts and contributes to the ongoing discussion on AI safety and security.