π€ AI Summary
This work addresses the challenges of ambiguous user requests leading to incorrect model selection and the degradation of personalized concept representations under standard quantization in on-demand personalized diffusion models. To this end, we propose PersonalQ, a unified framework that jointly optimizes model selection and quantization through a shared trigger-word signal. Specifically, PersonalQ employs intent-aligned hybrid retrieval combined with large language model-based reranking for accurate model selection, and introduces Trigger-word-Aware Quantization (TAQ)βa mixed-precision quantization strategy that preserves critical personalized features during model compression. Experimental results demonstrate that PersonalQ significantly improves intent-alignment accuracy, while TAQ achieves a superior trade-off between compression ratio and generation quality compared to existing post-training quantization methods, enabling efficient and high-fidelity deployment of personalized diffusion models.
π Abstract
Personalized text-to-image generation lets users fine-tune diffusion models into repositories of concept-specific checkpoints, but serving these repositories efficiently is difficult for two reasons: natural-language requests are often ambiguous and can be misrouted to visually similar checkpoints, and standard post-training quantization can distort the fragile representations that encode personalized concepts. We present PersonalQ, a unified framework that connects checkpoint selection and quantization through a shared signal -- the checkpoint's trigger token. Check-in performs intent-aligned selection by combining intent-aware hybrid retrieval with LLM-based reranking over checkpoint context and asks a brief clarification question only when multiple intents remain plausible; it then rewrites the prompt by inserting the selected checkpoint's canonical trigger. Complementing this, Trigger-Aware Quantization (TAQ) applies trigger-aware mixed precision in cross-attention, preserving trigger-conditioned key/value rows (and their attention weights) while aggressively quantizing the remaining pathways for memory-efficient inference. Experiments show that PersonalQ improves intent alignment over retrieval and reranking baselines, while TAQ consistently offers a stronger compression-quality trade-off than prior diffusion PTQ methods, enabling scalable serving of personalized checkpoints without sacrificing fidelity.