ExpertSteer: Intervening in LLMs through Expert Knowledge

📅 2025-05-18

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

To address weak controllability and bias susceptibility in large language model (LLM) inference—stemming from reliance on internal, static steering vectors—this paper proposes Cross-Model Activation Intervention (CMAI), the first framework to dynamically generate steering vectors via an external expert model for guiding arbitrary target LLMs during forward inference. CMAI comprises three core components: autoencoder-based representation alignment, mutual information–driven inter-layer matching, and Recursive Feature Machine (RFM)–based vector generation—all without fine-tuning the target LLM’s parameters. Evaluated across three mainstream LLMs, 15 benchmark tasks, and four major domains, CMAI consistently outperforms baselines, achieving high behavioral controllability, zero parameter updates to the target model, and low computational overhead. The framework establishes a novel, transferable, and lightweight paradigm for controllable LLM inference.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) exhibit remarkable capabilities across various tasks, yet guiding them to follow desired behaviours during inference remains a significant challenge. Activation steering offers a promising method to control the generation process of LLMs by modifying their internal activations. However, existing methods commonly intervene in the model's behaviour using steering vectors generated by the model itself, which constrains their effectiveness to that specific model and excludes the possibility of leveraging powerful external expert models for steering. To address these limitations, we propose ExpertSteer, a novel approach that leverages arbitrary specialized expert models to generate steering vectors, enabling intervention in any LLMs. ExpertSteer transfers the knowledge from an expert model to a target LLM through a cohesive four-step process: first aligning representation dimensions with auto-encoders to enable cross-model transfer, then identifying intervention layer pairs based on mutual information analysis, next generating steering vectors from the expert model using Recursive Feature Machines, and finally applying these vectors on the identified layers during inference to selectively guide the target LLM without updating model parameters. We conduct comprehensive experiments using three LLMs on 15 popular benchmarks across four distinct domains. Experiments demonstrate that ExpertSteer significantly outperforms established baselines across diverse tasks at minimal cost.

Problem

Research questions and friction points this paper is trying to address.

Guiding LLMs to follow desired behaviors during inference

Leveraging external expert models for steering LLMs

Transferring expert knowledge to target LLMs without parameter updates

Innovation

Methods, ideas, or system contributions that make the work stand out.

Aligns representation dimensions using auto-encoders

Identifies intervention layers via mutual information

Generates steering vectors with Recursive Feature Machines

🔎 Similar Papers

GenAudit: Fixing Factual Errors in Language Model Outputs with Evidence