🤖 AI Summary
This work addresses a critical gap in AI safety research, which often examines threats in isolation without a unified understanding of the bidirectional risk propagation between data and models. The paper proposes the first closed-loop threat taxonomy that systematically characterizes security threats across four interaction types: data↔data, data→model, model→data, and model↔model. Through a systematic literature review and taxonomic analysis, it integrates diverse technical pathways—including adversarial attacks, privacy inference, model stealing, and data poisoning—to uncover intrinsic relationships among seemingly disparate threats. This comprehensive framework provides both a theoretical foundation and a unified perspective for developing scalable, transferable, and cross-modal safety strategies for foundation models.
📝 Abstract
As machine learning (ML) systems expand in both scale and functionality, the security landscape has become increasingly complex, with a proliferation of attacks and defenses. However, existing studies largely treat these threats in isolation, lacking a coherent framework to expose their shared principles and interdependencies. This fragmented view hinders systematic understanding and limits the design of comprehensive defenses. Crucially, the two foundational assets of ML -- \textbf{data} and \textbf{models} -- are no longer independent; vulnerabilities in one directly compromise the other. The absence of a holistic framework leaves open questions about how these bidirectional risks propagate across the ML pipeline. To address this critical gap, we propose a \emph{unified closed-loop threat taxonomy} that explicitly frames model-data interactions along four directional axes. Our framework offers a principled lens for analyzing and defending foundation models. The resulting four classes of security threats represent distinct but interrelated categories of attacks: (1) Data$\rightarrow$Data (D$\rightarrow$D): including \emph{data decryption attacks and watermark removal attacks}; (2) Data$\rightarrow$Model (D$\rightarrow$M): including \emph{poisoning, harmful fine-tuning attacks, and jailbreak attacks}; (3) Model$\rightarrow$Data (M$\rightarrow$D): including \emph{model inversion, membership inference attacks, and training data extraction attacks}; (4) Model$\rightarrow$Model (M$\rightarrow$M): including \emph{model extraction attacks}. Our unified framework elucidates the underlying connections among these security threats and establishes a foundation for developing scalable, transferable, and cross-modal security strategies, particularly within the landscape of foundation models.