A Lightweight Method to Disrupt Memorized Sequences in LLM

📅 2025-02-07

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

Large language models (LLMs) risk unintentionally reproducing copyright-protected content, raising legal and ethical concerns; existing unlearning methods—such as differential privacy or neuron editing—typically require full model retraining or direct weight access, often degrading performance. This paper proposes TokenSwap, a lightweight, training-free, weight-agnostic post-processing technique: it employs a small auxiliary model (e.g., DistilGPT-2) to perform syntax-aware token identification and probabilistic reweighting, dynamically substituting high-risk token distributions. TokenSwap introduces “distillation-guided collaborative inference”—a novel post-hoc intervention mechanism that achieves strong unlearning without any model access, preserving original functionality. Experiments on Pythia-6.9B and LLaMA-3-8B show up to a 10× reduction in canonical memorized outputs, with downstream task accuracy variations under 0.3%, demonstrating zero performance degradation.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) demonstrate impressive capabilities across many tasks yet risk reproducing copyrighted content verbatim, raising legal and ethical concerns. Although methods like differential privacy or neuron editing can reduce memorization, they typically require costly retraining or direct access to model weights and may degrade performance. To address these challenges, we propose TokenSwap, a lightweight, post-hoc approach that replaces the probabilities of grammar-related tokens with those from a small auxiliary model (e.g., DistilGPT-2). We run extensive experiments on commercial grade models such as Pythia-6.9b and LLaMA-3-8b and demonstrate that our method effectively reduces well-known cases of memorized generation by upto 10x with little to no impact on downstream tasks. Our approach offers a uniquely accessible and effective solution to users of real-world systems.

Problem

Research questions and friction points this paper is trying to address.

Reduce memorized content in LLMs

Avoid costly retraining or weight access

Maintain performance with minimal impact

Innovation

Methods, ideas, or system contributions that make the work stand out.

TokenSwap method

replaces grammar-related tokens

uses auxiliary model

🔎 Similar Papers

Unlearning or Obfuscating? Jogging the Memory of Unlearned LLMs via Benign Relearning