Fints: Efficient Inference-Time Personalization for LLMs with Fine-Grained Instance-Tailored Steering

📅 2025-10-31

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

To address the weak personalization adaptability and low data efficiency of large language models (LLMs) under dynamic user preferences and high data sparsity, this paper proposes a fine-grained, instance-level inference-time steering framework. The method hooks internal attention and MLP layer activations, then employs an input-aware signal aggregation mechanism to dynamically generate sample-specific, non-parametric intervention vectors injected into the forward pass—enabling efficient, context-sensitive personalization. Its core contributions are: (1) fine-grained inter-layer activation modeling; (2) input-driven adaptive aggregation; and (3) orthogonal compatibility with existing methods, enabling plug-and-play integration. Experiments across diverse tasks—including short/long text generation and web function calling—demonstrate significant improvements in personalized performance. Notably, the framework maintains robustness and generalization under rapid user distribution shifts and heterogeneous interaction patterns, even with limited user feedback.

Technology Category

Application Category

📝 Abstract

The rapid evolution of large language models (LLMs) has intensified the demand for effective personalization techniques that can adapt model behavior to individual user preferences. Despite the non-parametric methods utilizing the in-context learning ability of LLMs, recent parametric adaptation methods, including personalized parameter-efficient fine-tuning and reward modeling emerge. However, these methods face limitations in handling dynamic user patterns and high data sparsity scenarios, due to low adaptability and data efficiency. To address these challenges, we propose a fine-grained and instance-tailored steering framework that dynamically generates sample-level interference vectors from user data and injects them into the model's forward pass for personalized adaptation. Our approach introduces two key technical innovations: a fine-grained steering component that captures nuanced signals by hooking activations from attention and MLP layers, and an input-aware aggregation module that synthesizes these signals into contextually relevant enhancements. The method demonstrates high flexibility and data efficiency, excelling in fast-changing distribution and high data sparsity scenarios. In addition, the proposed method is orthogonal to existing methods and operates as a plug-in component compatible with different personalization techniques. Extensive experiments across diverse scenarios--including short-to-long text generation, and web function calling--validate the effectiveness and compatibility of our approach. Results show that our method significantly enhances personalization performance in fast-shifting environments while maintaining robustness across varying interaction modes and context lengths. Implementation is available at https://github.com/KounianhuaDu/Fints.

Problem

Research questions and friction points this paper is trying to address.

Dynamic personalization for LLMs under fast-changing user patterns

Addressing data sparsity in personalized adaptation of language models

Enhancing instance-level customization without extensive retraining

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-grained steering captures nuanced activation signals

Input-aware aggregation synthesizes contextually relevant enhancements

Dynamic interference vectors injected during forward pass

🔎 Similar Papers

Comparing Retrieval-Augmentation and Parameter-Efficient Fine-Tuning for Privacy-Preserving Personalization of Large Language Models

2024-09-14arXiv.orgCitations: 10

Google

$207,000-$300,000 + bonus + equity + benefits.

Mountain View, CA, USA

Research Engineer, Language - Personalization, Meta Superintelligence Labs