AI Security in the Foundation Model Era: A Comprehensive Survey from a Unified Perspective

📅 2026-03-25

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This work addresses a critical gap in AI safety research, which often examines threats in isolation without a unified understanding of the bidirectional risk propagation between data and models. The paper proposes the first closed-loop threat taxonomy that systematically characterizes security threats across four interaction types: data↔data, data→model, model→data, and model↔model. Through a systematic literature review and taxonomic analysis, it integrates diverse technical pathways—including adversarial attacks, privacy inference, model stealing, and data poisoning—to uncover intrinsic relationships among seemingly disparate threats. This comprehensive framework provides both a theoretical foundation and a unified perspective for developing scalable, transferable, and cross-modal safety strategies for foundation models.

Technology Category

Application Category

📝 Abstract

As machine learning (ML) systems expand in both scale and functionality, the security landscape has become increasingly complex, with a proliferation of attacks and defenses. However, existing studies largely treat these threats in isolation, lacking a coherent framework to expose their shared principles and interdependencies. This fragmented view hinders systematic understanding and limits the design of comprehensive defenses. Crucially, the two foundational assets of ML -- \textbf{data} and \textbf{models} -- are no longer independent; vulnerabilities in one directly compromise the other. The absence of a holistic framework leaves open questions about how these bidirectional risks propagate across the ML pipeline. To address this critical gap, we propose a \emph{unified closed-loop threat taxonomy} that explicitly frames model-data interactions along four directional axes. Our framework offers a principled lens for analyzing and defending foundation models. The resulting four classes of security threats represent distinct but interrelated categories of attacks: (1) Data$\rightarrow$Data (D$\rightarrow$D): including \emph{data decryption attacks and watermark removal attacks}; (2) Data$\rightarrow$Model (D$\rightarrow$M): including \emph{poisoning, harmful fine-tuning attacks, and jailbreak attacks}; (3) Model$\rightarrow$Data (M$\rightarrow$D): including \emph{model inversion, membership inference attacks, and training data extraction attacks}; (4) Model$\rightarrow$Model (M$\rightarrow$M): including \emph{model extraction attacks}. Our unified framework elucidates the underlying connections among these security threats and establishes a foundation for developing scalable, transferable, and cross-modal security strategies, particularly within the landscape of foundation models.

Problem

Research questions and friction points this paper is trying to address.

AI security

foundation models

data-model interaction

threat taxonomy

machine learning security

Innovation

Methods, ideas, or system contributions that make the work stand out.

unified threat taxonomy

foundation model security

data-model interaction