Trustworthy AI on Safety, Bias, and Privacy: A Survey

📅 2025-02-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper systematically analyzes three core challenges in trustworthy AI: (1) safety alignment failures in large language models (LLMs), (2) false biases embedded in models and their propagation mechanisms, and (3) privacy risks from membership inference attacks against deep neural networks. Leveraging integrated adversarial testing, bias diagnostics, privacy risk modeling, and empirical evaluation, it uncovers the shared underlying mechanisms across these failures and proposes a unified, cross-dimensional framework for assessing AI trustworthiness. The work introduces an innovative, actionable governance roadmap jointly addressing safety, fairness (bias mitigation), and privacy—identifying critical bottlenecks in each dimension and specifying implementable countermeasures. By bridging theoretical analysis with practical validation, the study provides foundational principles and operational guidelines for developing robust, fair, and regulation-compliant industrial AI systems. (149 words)

Technology Category

Application Category

📝 Abstract
The capabilities of artificial intelligence systems have been advancing to a great extent, but these systems still struggle with failure modes, vulnerabilities, and biases. In this paper, we study the current state of the field, and present promising insights and perspectives regarding concerns that challenge the trustworthiness of AI models. In particular, this paper investigates the issues regarding three thrusts: safety, privacy, and bias, which hurt models' trustworthiness. For safety, we discuss safety alignment in the context of large language models, preventing them from generating toxic or harmful content. For bias, we focus on spurious biases that can mislead a network. Lastly, for privacy, we cover membership inference attacks in deep neural networks. The discussions addressed in this paper reflect our own experiments and observations.
Problem

Research questions and friction points this paper is trying to address.

AI safety in large language models
Bias prevention in neural networks
Privacy against membership inference attacks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Safety alignment large language models
Address spurious biases networks
Prevent membership inference attacks
🔎 Similar Papers
No similar papers found.
X
Xingli Fang
Computer Science, North Carolina State University
J
Jianwei Li
Computer Science, North Carolina State University
Jung-Eun Kim
Jung-Eun Kim
Assistant Professor, Computer Science, North Carolina State University
Trustworthy AIInterpretable AIEfficient AIAI Safety