PersonalQ: Select, Quantize, and Serve Personalized Diffusion Models for Efficient Inference

πŸ“… 2026-03-24
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenges of ambiguous user requests leading to incorrect model selection and the degradation of personalized concept representations under standard quantization in on-demand personalized diffusion models. To this end, we propose PersonalQ, a unified framework that jointly optimizes model selection and quantization through a shared trigger-word signal. Specifically, PersonalQ employs intent-aligned hybrid retrieval combined with large language model-based reranking for accurate model selection, and introduces Trigger-word-Aware Quantization (TAQ)β€”a mixed-precision quantization strategy that preserves critical personalized features during model compression. Experimental results demonstrate that PersonalQ significantly improves intent-alignment accuracy, while TAQ achieves a superior trade-off between compression ratio and generation quality compared to existing post-training quantization methods, enabling efficient and high-fidelity deployment of personalized diffusion models.

Technology Category

Application Category

πŸ“ Abstract
Personalized text-to-image generation lets users fine-tune diffusion models into repositories of concept-specific checkpoints, but serving these repositories efficiently is difficult for two reasons: natural-language requests are often ambiguous and can be misrouted to visually similar checkpoints, and standard post-training quantization can distort the fragile representations that encode personalized concepts. We present PersonalQ, a unified framework that connects checkpoint selection and quantization through a shared signal -- the checkpoint's trigger token. Check-in performs intent-aligned selection by combining intent-aware hybrid retrieval with LLM-based reranking over checkpoint context and asks a brief clarification question only when multiple intents remain plausible; it then rewrites the prompt by inserting the selected checkpoint's canonical trigger. Complementing this, Trigger-Aware Quantization (TAQ) applies trigger-aware mixed precision in cross-attention, preserving trigger-conditioned key/value rows (and their attention weights) while aggressively quantizing the remaining pathways for memory-efficient inference. Experiments show that PersonalQ improves intent alignment over retrieval and reranking baselines, while TAQ consistently offers a stronger compression-quality trade-off than prior diffusion PTQ methods, enabling scalable serving of personalized checkpoints without sacrificing fidelity.
Problem

Research questions and friction points this paper is trying to address.

personalized diffusion models
efficient inference
checkpoint selection
post-training quantization
intent ambiguity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Personalized Diffusion Models
Trigger-Aware Quantization
Intent-Aligned Retrieval
Mixed-Precision Quantization
Efficient Inference
πŸ”Ž Similar Papers
No similar papers found.
Q
Qirui Wang
School of Software Engineering, Xi’an Jiaotong University, China
Qi Guo
Qi Guo
Assistant Professor of Electrical and Computer Engineering, Purdue University
visual sensingcomputational optics
Yiding Sun
Yiding Sun
Renmin University of China
Large Language ModelsExplainable Recommendation
J
Junkai Yang
School of Software Engineering, Xi’an Jiaotong University, China
Dongxu Zhang
Dongxu Zhang
Optum AI, PhD from UMass Amherst
LLMsnatural language processingrepresentation learningmachine learning
S
Shanmin Pang
School of Software Engineering, Xi’an Jiaotong University, China
Q
Qing Guo
Nankai University, China