LLMAtKGE: Large Language Models as Explainable Attackers against Knowledge Graph Embeddings

📅 2025-10-13

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

This work addresses the weak interpretability and poor generalizability of black-box adversarial attacks on knowledge graph embedding (KGE) models. We propose LLMAtKGE, the first LLM-based interpretable attack framework for KGE. Our method integrates semantic and topological information via structured prompt engineering, employs centrality-based filtering and precomputed higher-order adjacency to efficiently identify candidate attack triples, and leverages a fine-tuned triple classifier coupled with chain-of-thought reasoning to generate human-readable natural-language justifications. Evaluated on FB15k-237 and WN18RR, LLMAtKGE significantly outperforms existing black-box baselines—achieving attack effectiveness comparable to white-box methods—while simultaneously producing high-fidelity, verifiable explanations.

Technology Category

Application Category

📝 Abstract

Adversarial attacks on knowledge graph embeddings (KGE) aim to disrupt the model's ability of link prediction by removing or inserting triples. A recent black-box method has attempted to incorporate textual and structural information to enhance attack performance. However, it is unable to generate human-readable explanations, and exhibits poor generalizability. In the past few years, large language models (LLMs) have demonstrated powerful capabilities in text comprehension, generation, and reasoning. In this paper, we propose LLMAtKGE, a novel LLM-based framework that selects attack targets and generates human-readable explanations. To provide the LLM with sufficient factual context under limited input constraints, we design a structured prompting scheme that explicitly formulates the attack as multiple-choice questions while incorporating KG factual evidence. To address the context-window limitation and hesitation issues, we introduce semantics-based and centrality-based filters, which compress the candidate set while preserving high recall of attack-relevant information. Furthermore, to efficiently integrate both semantic and structural information into the filter, we precompute high-order adjacency and fine-tune the LLM with a triple classification task to enhance filtering performance. Experiments on two widely used knowledge graph datasets demonstrate that our attack outperforms the strongest black-box baselines and provides explanations via reasoning, and showing competitive performance compared with white-box methods. Comprehensive ablation and case studies further validate its capability to generate explanations.

Problem

Research questions and friction points this paper is trying to address.

Generating human-readable explanations for adversarial attacks on knowledge graphs

Enhancing attack generalizability by integrating semantic and structural information

Addressing LLM context limitations through structured prompting and filtering techniques

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs generate human-readable attack explanations

Structured prompting formulates attacks as multiple-choice questions

Semantic and centrality filters compress candidate sets efficiently

🔎 Similar Papers

From Latent to Lucid: Transforming Knowledge Graph Embeddings into Interpretable Structures