BILLY: Steering Large Language Models via Merging Persona Vectors for Creative Generation

📅 2025-10-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multi-LLM systems enhance creativity but incur high computational overhead and substantial inference latency. To address this, we propose BILLY—a training-free, single-model multi-agent simulation framework. BILLY extracts and linearly fuses multiple interpretable “personality vectors” from the LLM’s hidden-layer activation space, internalizing diverse expert perspectives and capabilities within a unified model. By integrating prefix tuning with zero-shot prompting, it enables fine-grained generative control and behavior interpretability. On multiple creative generation benchmarks—including story continuation and metaphor generation—BILLY significantly outperforms strong single-model baselines and conventional multi-LLM collaboration methods. It achieves a 3.2× inference speedup and reduces GPU memory consumption by 58%, marking the first approach to realize collective-intelligence-like generation that is low-cost, low-latency, and highly controllable.

Technology Category

Application Category

📝 Abstract
Multi-LLM systems enhance the creativity of large language models by simulating human collective intelligence but suffer from significant drawbacks, such as high computational costs and inference latency. To address these limitations, we propose BILLY (BlendIng persona vectors for Large Language model creativitY), a training-free framework that captures the benefits of multi-LLM collaboration, i.e. inducing diverse perspectives and specialized expertise, within a single model. BILLY operates by extracting and blending multiple distinct persona vectors directly in the model's activation space. We steer the model's generation process with this merged vector while inference, enabling multi-perspective output without explicit multi-LLM communication. Our experiments across creativity-oriented benchmarks demonstrate that BILLY surpasses single model prompting and traditional multi-LLM approaches, while substantially reducing inference time and computational costs. Our analyses further reveal that distinct persona vectors can be blended to achieve both effective control over complementary aspects of generation and greater interpretability.
Problem

Research questions and friction points this paper is trying to address.

Reducing computational costs and latency in multi-LLM creative generation systems
Enabling multi-perspective outputs within a single model without multi-LLM communication
Achieving effective control and interpretability through persona vector blending
Innovation

Methods, ideas, or system contributions that make the work stand out.

Merges persona vectors in activation space
Enables multi-perspective generation in single model
Achieves training-free control with blended vectors
🔎 Similar Papers
No similar papers found.
T
Tsung-Min Pai
Department of Electrical Engineering, National Taiwan University
J
Jui-I Wang
Department of Computer Science & Information Engineering, National Taiwan University
L
Li-Chun Lu
Graduate Institute of Communication Engineering, National Taiwan University
Shao-Hua Sun
Shao-Hua Sun
Assistant Professor at National Taiwan University
Machine LearningRobot LearningReinforcement LearningProgram Synthesis
H
Hung-Yi Lee
Department of Electrical Engineering, National Taiwan University
K
Kai-Wei Chang
CSAIL, Massachusetts Institute of Technology