A Survey on Data Security in Large Language Models

📅 2025-08-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper systematically examines data-centric security risks in large language models (LLMs), including training data contamination, prompt injection, and data poisoning—threats that induce toxic outputs, hallucinations, and degraded reliability. Through a comprehensive literature review and taxonomic analysis, it presents the first holistic classification of LLM data security threats and corresponding defenses, covering techniques such as adversarial training, reinforcement learning from human feedback (RLHF), and data augmentation, while also systematically evaluating mainstream safety benchmark datasets. Innovatively, it proposes an “interpretability-driven” defense and governance framework, emphasizing auditable safety update mechanisms and transparent decision-making pathways. The study establishes the first end-to-end LLM data security risk–defense taxonomy, offering both theoretical foundations and practical guidelines for researchers, developers, and policymakers to advance trustworthy LLM development.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs), now a foundation in advancing natural language processing, power applications such as text generation, machine translation, and conversational systems. Despite their transformative potential, these models inherently rely on massive amounts of training data, often collected from diverse and uncurated sources, which exposes them to serious data security risks. Harmful or malicious data can compromise model behavior, leading to issues such as toxic output, hallucinations, and vulnerabilities to threats such as prompt injection or data poisoning. As LLMs continue to be integrated into critical real-world systems, understanding and addressing these data-centric security risks is imperative to safeguard user trust and system reliability. This survey offers a comprehensive overview of the main data security risks facing LLMs and reviews current defense strategies, including adversarial training, RLHF, and data augmentation. Additionally, we categorize and analyze relevant datasets used for assessing robustness and security across different domains, providing guidance for future research. Finally, we highlight key research directions that focus on secure model updates, explainability-driven defenses, and effective governance frameworks, aiming to promote the safe and responsible development of LLM technology. This work aims to inform researchers, practitioners, and policymakers, driving progress toward data security in LLMs.
Problem

Research questions and friction points this paper is trying to address.

Identifying data security risks in Large Language Models
Reviewing defense strategies against harmful data in LLMs
Proposing future research directions for secure LLM development
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adversarial training for LLM security
RLHF to mitigate harmful outputs
Data augmentation for robustness enhancement
🔎 Similar Papers
No similar papers found.
K
Kang Chen
School of Computer Engineering, Jimei University, Xiamen, 361021, China; College of Science, Mathematics and Technology, Wenzhou-Kean University, Wenzhou, 325060, China
Xiuze Zhou
Xiuze Zhou
The Hong Kong University of Science and Technology (Guangzhou)
Machine LearningRecommendation SystemsLarge Language Models
Y
Yuanguo Lin
School of Computer Engineering, Jimei University, Xiamen, 361021, China
J
Jinhe Su
School of Computer Engineering, Jimei University, Xiamen, 361021, China
Y
Yuanhui Yu
School of Computer Engineering, Jimei University, Xiamen, 361021, China
L
Li Shen
School of Professional Studies, New York University, New York, 10003, United States
Fan Lin
Fan Lin
Software School,Xiamen University
Large Language ModelsInternet of ThingsRecommendation Systemreinforcement learning