🤖 AI Summary
Language modeling is inherently ill-posed: a single context often admits multiple semantically valid continuations. To address this, we propose LoRA-MCL—a novel framework that integrates Multi-Choice Learning (MCL) with Low-Rank Adaptation (LoRA) and introduces a Winner-Takes-All (WTA) loss function. Unlike prior approaches, LoRA-MCL explicitly models the output as a mixture distribution during inference, enabling efficient and controllable diverse generation without relying on post-hoc filtering or sampling heuristics. Theoretically, our method grounds diversity modeling in a mixture Markov chain assumption, yielding interpretable probabilistic semantics. Methodologically, it directly optimizes the model’s output distribution end-to-end. Evaluated on vision and audio captioning tasks, LoRA-MCL achieves substantial gains in lexical diversity (+23.6% Dist-2) and semantic fidelity (+15.4% BLEU-4), while preserving inference efficiency.
📝 Abstract
We propose LoRA-MCL, a training scheme that extends next-token prediction in language models with a method designed to decode diverse, plausible sentence continuations at inference time. Traditional language modeling is an intrinsically ill-posed problem: given a context, multiple futures may be equally plausible. Our approach leverages Multiple Choice Learning (MCL) and the Winner-Takes-All (WTA) loss to efficiently handle ambiguity through Low-Rank Adaptation (LoRA). We provide a theoretical interpretation of applying Multiple Choice Learning to Language Modeling, assuming the data is generated from a mixture of distributions. To illustrate the proposed approach, we use data sampled from mixtures of Markov chains. We then demonstrate with extensive experiments on real-world visual and audio captioning tasks that our method achieves high diversity and relevance in generated outputs.