Privately Learning from Graphs with Applications in Fine-tuning Large Language Models

📅 2024-10-10

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

🤖 AI Summary

Conventional differential privacy (DP) methods—such as DP-SGD—fail on graph-structured data due to inherent structural dependencies among nodes, undermining privacy guarantees during fine-tuning of large language models (LLMs) on text-attributed graphs. Method: This paper proposes the first DP-compliant fine-tuning framework tailored to graph structural dependencies. It introduces a graph-relational decoupling sampling mechanism to relax the i.i.d. sample assumption of DP-SGD; designs a privacy-preserving fine-tuning paradigm integrating GNN-based graph embedding with LLM adaptation (compatible with BERT and Llama2); and establishes a tripartite trade-off analysis framework balancing privacy (ε), utility, and efficiency. Results: Evaluated on four real-world text-attributed graphs, the framework achieves an average 12.7% improvement in F1 score while strictly satisfying ε ≤ 4 DP guarantees. The implementation is open-sourced for full reproducibility.

Technology Category

Application Category

📝 Abstract

Graphs offer unique insights into relationships and interactions between entities, complementing data modalities like text, images, and videos. By incorporating relational information from graph data, AI models can extend their capabilities beyond traditional tasks. However, relational data in sensitive domains such as finance and healthcare often contain private information, making privacy preservation crucial. Existing privacy-preserving methods, such as DP-SGD, which rely on gradient decoupling assumptions, are not well-suited for relational learning due to the inherent dependencies between coupled training samples. To address this challenge, we propose a privacy-preserving relational learning pipeline that decouples dependencies in sampled relations during training, ensuring differential privacy through a tailored application of DP-SGD. We apply this method to fine-tune large language models (LLMs) on sensitive graph data, and tackle the associated computational complexities. Our approach is evaluated on LLMs of varying sizes (e.g., BERT, Llama2) using real-world relational data from four text-attributed graphs. The results demonstrate significant improvements in relational learning tasks, all while maintaining robust privacy guarantees during training. Additionally, we explore the trade-offs between privacy, utility, and computational efficiency, offering insights into the practical deployment of our approach. Code is available at https://github.com/Graph-COM/PvGaLM.

Problem

Research questions and friction points this paper is trying to address.

Learning from sensitive graph data with privacy

Incompatibility of DP-SGD with relational learning

Fine-tuning LLMs on private graph-structured information

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decouples dependencies in sampled relations

Applies tailored DP-SGD for differential privacy

Enables LLM fine-tuning on sensitive graph data

🔎 Similar Papers

No similar papers found.

Authors to Follow