Multi-Personality Generation of LLMs at Decoding-time

📅 2025-10-27

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This work addresses the challenge of efficiently generating multi-persona outputs with large language models (LLMs), without fine-tuning or auxiliary models. We propose a decoding-time multi-persona fusion framework that leverages implicit density ratios—pretrained in single-dimensional models—as transferable, “free” resources. Crucially, we reinterpret these ratios as controllable policy components, enabling flexible and robust composition across multiple attributes (e.g., MBTI dimensions, role-specific stylistic traits). To enhance efficiency without compromising output quality, we further introduce Sliding-Window Block-level Speculative Rejection Sampling (SCR), a novel decoding strategy that accelerates inference via adaptive block-level rejection. Experiments demonstrate relative improvements of 16–18% over baselines on MBTI classification and role-playing benchmarks, substantially outperforming existing heuristic or multi-model approaches. Our code and datasets are publicly released.

Technology Category

Application Category

📝 Abstract

Multi-personality generation for LLMs, enabling simultaneous embodiment of multiple personalization attributes, is a fundamental challenge. Existing retraining-based approaches are costly and poorly scalable, while decoding-time methods often rely on external models or heuristics, limiting flexibility and robustness. In this paper, we propose a novel Multi-Personality Generation (MPG) framework under the decoding-time combination paradigm. It flexibly controls multi-personality without relying on scarce multi-dimensional models or extra training, leveraging implicit density ratios in single-dimensional models as a"free lunch"to reformulate the task as sampling from a target strategy aggregating these ratios. To implement MPG efficiently, we design Speculative Chunk-level based Rejection sampling (SCR), which generates responses in chunks and parallelly validates them via estimated thresholds within a sliding window. This significantly reduces computational overhead while maintaining high-quality generation. Experiments on MBTI personality and Role-Playing demonstrate the effectiveness of MPG, showing improvements up to 16%-18%. Code and data are available at https://github.com/Libra117/MPG .

Problem

Research questions and friction points this paper is trying to address.

Enabling simultaneous multi-personality generation in LLMs

Eliminating costly retraining and external model dependencies

Achieving flexible personality control through decoding-time sampling

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decoding-time framework controls multi-personality without retraining

Leverages implicit density ratios from single-dimensional models

Uses speculative chunk-level rejection sampling for efficiency

🔎 Similar Papers

Helpful assistant or fruitful facilitator? Investigating how personas affect language model behavior

2024-07-02arXiv.orgCitations: 3