Resolving Lexical Bias in Model Editing

📅 2024-08-19

📈 Citations: 1

✨ Influential: 0

career value

157K/year

🤖 AI Summary

Existing adapter-based large language model (LLM) editing methods suffer from significant lexical bias, leading to erroneous activation on irrelevant prompts containing overlapping tokens. To address this, we propose Projector Editor Networks (PENME), a decoupled representation editing framework that modifies neither the original LLM weights nor its architecture. Our key contributions are threefold: (i) we systematically identify and formalize the lexical bias mechanism inherent in editing adapters; (ii) we design a decoupled representation learning objective that explicitly pulls semantically related prompts closer in latent space while pushing unrelated ones apart; and (iii) we introduce a semantic similarity gating mechanism to enable precise, context-aware adapter activation. Evaluated across multiple editing benchmarks, PENME achieves state-of-the-art editing accuracy with reduced inference overhead and demonstrates strong compatibility across diverse LLM architectures.

Technology Category

Application Category

📝 Abstract

Model editing aims to modify the outputs of large language models after they are trained. Previous approaches have often involved direct alterations to model weights, which can result in model degradation. Recent techniques avoid making modifications to the model's weights by using an adapter that applies edits to the model when triggered by semantic similarity in the representation space. We demonstrate that current adapter methods are critically vulnerable to strong lexical biases, leading to issues such as applying edits to irrelevant prompts with overlapping words. This paper presents a principled approach to learning a disentangled representation space that facilitates precise localization of edits by maintaining distance between irrelevant prompts while preserving proximity among paraphrases. In our empirical study, we show that our method (Projector Editor Networks for Model Editing - PENME) achieves state-of-the-art model editing results while being more computationally efficient during inference than previous methods and adaptable across different architectures.

Problem

Research questions and friction points this paper is trying to address.

Resolving lexical bias in model editing techniques

Preventing edits from applying to irrelevant word-overlapping prompts

Learning disentangled representation space for precise edit localization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses adapter for edits without weight changes

Learns disentangled representation space for precision

Achieves efficient inference across architectures

🔎 Similar Papers

No similar papers found.