Enhancing Instruction-Following Capabilities in Seq2Seq Models: DoLA Adaptations for T5

๐Ÿ“… 2025-12-03
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This study investigates enhancing instruction-following capabilities in encoder-decoder models (e.g., T5, FLAN-T5) by adapting Decoding by Contrastive Layers (DoLa)โ€”a contrastive decoding methodโ€”to this architecture for the first time. Methodologically, DoLa leverages inter-layer logit differences during decoding to guide token selection, enabling inference-time generation optimization without fine-tuning. Key contributions include: (1) the first successful integration of DoLa into encoder-decoder frameworks; (2) a systematic evaluation suite specifically designed for instruction-following performance; and (3) mechanistic insights into DoLaโ€™s operation via layer-wise logit trajectory analysis, revealing its probabilistic distribution calibration effect. Experiments show DoLa significantly improves factual consistency and instruction adherence on tasks such as fact-based QA and instruction paraphrasing, yet degrades performance on logical reasoning, highlighting its task dependency and practical boundaries.

Technology Category

Application Category

๐Ÿ“ Abstract
Contrastive decoding is a lightweight and effective inference-time method that improves the quality of text generation in Large Language Models. However, algorithms such as DoLa (Decoding by Contrastive Layers) have only been implemented in decoder-only architectures and studied for their impact on improving factuality. This work adapts DoLa for the T5 and FLAN-T5 model families and evaluates its impact on the models'instruction following capabilities, which to our knowledge is the first implementation of a contrastive decoding strategy in an encoder-decoder architecture. Our results show that DoLa improves the faithfulness of text generation for certain categories of tasks and harms others. To understand these results, we present a layer-by-layer analysis of logit evolution in a FLAN-T5 model to quantify DoLa's impact on token output probabilities.
Problem

Research questions and friction points this paper is trying to address.

Adapt DoLa to encoder-decoder T5 models
Evaluate DoLa's effect on instruction-following ability
Analyze layer-wise logit changes in FLAN-T5
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapted DoLa for encoder-decoder T5 models
Evaluated impact on instruction following capabilities
Conducted layer-by-layer logit evolution analysis
๐Ÿ”Ž Similar Papers
No similar papers found.
H
Huey Sun
University College London
Anabel Yong
Anabel Yong
National University of Singapore
samplingprobabilistic machine learningAI4Science
L
Lorenzo Gilly
University College London
F
Felipe Jin
University College London