๐ค AI Summary
This study investigates enhancing instruction-following capabilities in encoder-decoder models (e.g., T5, FLAN-T5) by adapting Decoding by Contrastive Layers (DoLa)โa contrastive decoding methodโto this architecture for the first time. Methodologically, DoLa leverages inter-layer logit differences during decoding to guide token selection, enabling inference-time generation optimization without fine-tuning. Key contributions include: (1) the first successful integration of DoLa into encoder-decoder frameworks; (2) a systematic evaluation suite specifically designed for instruction-following performance; and (3) mechanistic insights into DoLaโs operation via layer-wise logit trajectory analysis, revealing its probabilistic distribution calibration effect. Experiments show DoLa significantly improves factual consistency and instruction adherence on tasks such as fact-based QA and instruction paraphrasing, yet degrades performance on logical reasoning, highlighting its task dependency and practical boundaries.
๐ Abstract
Contrastive decoding is a lightweight and effective inference-time method that improves the quality of text generation in Large Language Models. However, algorithms such as DoLa (Decoding by Contrastive Layers) have only been implemented in decoder-only architectures and studied for their impact on improving factuality. This work adapts DoLa for the T5 and FLAN-T5 model families and evaluates its impact on the models'instruction following capabilities, which to our knowledge is the first implementation of a contrastive decoding strategy in an encoder-decoder architecture. Our results show that DoLa improves the faithfulness of text generation for certain categories of tasks and harms others. To understand these results, we present a layer-by-layer analysis of logit evolution in a FLAN-T5 model to quantify DoLa's impact on token output probabilities.