Enhancing Instruction-Following Capabilities in Seq2Seq Models: DoLA Adaptations for T5

📅 2025-12-03

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

This study investigates enhancing instruction-following capabilities in encoder-decoder models (e.g., T5, FLAN-T5) by adapting Decoding by Contrastive Layers (DoLa)—a contrastive decoding method—to this architecture for the first time. Methodologically, DoLa leverages inter-layer logit differences during decoding to guide token selection, enabling inference-time generation optimization without fine-tuning. Key contributions include: (1) the first successful integration of DoLa into encoder-decoder frameworks; (2) a systematic evaluation suite specifically designed for instruction-following performance; and (3) mechanistic insights into DoLa’s operation via layer-wise logit trajectory analysis, revealing its probabilistic distribution calibration effect. Experiments show DoLa significantly improves factual consistency and instruction adherence on tasks such as fact-based QA and instruction paraphrasing, yet degrades performance on logical reasoning, highlighting its task dependency and practical boundaries.

Technology Category

Application Category

📝 Abstract

Contrastive decoding is a lightweight and effective inference-time method that improves the quality of text generation in Large Language Models. However, algorithms such as DoLa (Decoding by Contrastive Layers) have only been implemented in decoder-only architectures and studied for their impact on improving factuality. This work adapts DoLa for the T5 and FLAN-T5 model families and evaluates its impact on the models'instruction following capabilities, which to our knowledge is the first implementation of a contrastive decoding strategy in an encoder-decoder architecture. Our results show that DoLa improves the faithfulness of text generation for certain categories of tasks and harms others. To understand these results, we present a layer-by-layer analysis of logit evolution in a FLAN-T5 model to quantify DoLa's impact on token output probabilities.

Problem

Research questions and friction points this paper is trying to address.

Adapt DoLa to encoder-decoder T5 models

Evaluate DoLa's effect on instruction-following ability

Analyze layer-wise logit changes in FLAN-T5

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapted DoLa for encoder-decoder T5 models

Evaluated impact on instruction following capabilities

Conducted layer-by-layer logit evolution analysis

🔎 Similar Papers

No similar papers found.