Has the Two-Decade-Old Prophecy Come True? Artificial Bad Intelligence Triggered by Merely a Single-Bit Flip in Large Language Models

📅 2025-10-01

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

This work identifies, for the first time, the single-bit-flip vulnerability of large language models (LLMs) in the `.gguf` quantized format—where flipping a single bit in the weight file induces three semantic-level failures: factual hallucinations (“Artificial Defective Intelligence”), logical degradation (“Artificial Weak Intelligence”), and harmful content generation (“Artificial Malicious Intelligence”). Method: We propose an information-entropy-driven sensitivity modeling approach and BitSifter, a probabilistic heuristic scanning framework, to efficiently identify high-risk vulnerable bits within attention mechanisms and output layers. Leveraging hardware fault injection, `.gguf` format parsing, and a remote attack chain, we achieve precise vulnerability localization. Contribution/Results: In a 10B-parameter LLM, vulnerabilities are highly concentrated in tensor data regions; single-bit flips are executed with 100% success within 31.7 seconds, reducing model accuracy from 73.5% to 0%. This establishes the first reproducible, fine-grained vulnerability analysis paradigm for security assessment of quantized LLM deployment.

Technology Category

Application Category

📝 Abstract

Recently, Bit-Flip Attack (BFA) has garnered widespread attention for its ability to compromise software system integrity remotely through hardware fault injection. With the widespread distillation and deployment of large language models (LLMs) into single file .gguf formats, their weight spaces have become exposed to an unprecedented hardware attack surface. This paper is the first to systematically discover and validate the existence of single-bit vulnerabilities in LLM weight files: in mainstream open-source models (e.g., DeepSeek and QWEN) using .gguf quantized formats, flipping just single bit can induce three types of targeted semantic level failures Artificial Flawed Intelligence (outputting factual errors), Artificial Weak Intelligence (degradation of logical reasoning capability), and Artificial Bad Intelligence (generating harmful content). By building an information theoretic weight sensitivity entropy model and a probabilistic heuristic scanning framework called BitSifter, we achieved efficient localization of critical vulnerable bits in models with hundreds of millions of parameters. Experiments show that vulnerabilities are significantly concen- trated in the tensor data region, particularly in areas related to the attention mechanism and output layers, which are the most sensitive. A negative correlation was observed between model size and robustness, with smaller models being more susceptible to attacks. Furthermore, a remote BFA chain was designed, enabling semantic-level attacks in real-world environments: At an attack frequency of 464.3 times per second, a single bit can be flipped with 100% success in as little as 31.7 seconds. This causes the accuracy of LLM to plummet from 73.5% to 0%, without requiring high-cost equipment or complex prompt engineering.

Problem

Research questions and friction points this paper is trying to address.

Identifies single-bit vulnerabilities in LLM weight files causing targeted failures

Develops method to locate critical vulnerable bits in large parameter models

Demonstrates remote bit-flip attacks degrading model accuracy to zero

Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed weight sensitivity entropy model for vulnerability analysis

Created BitSifter framework to locate critical vulnerable bits

Designed remote bit-flip attack chain for real-world deployment

🔎 Similar Papers

Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation