MAGNET: Augmenting Generative Decoders with Representation Learning and Infilling Capabilities

📅 2025-01-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing large language models typically train unidirectional generation and bidirectional representation learning separately, hindering joint optimization of generation quality and semantic representation capability. This paper proposes the first multi-objective self-supervised framework that unifies causal and bidirectional attention within a decoder-only architecture. It jointly optimizes text generation, context-aware infilling, and deep semantic modeling through three complementary objectives: masked span prediction, contrastive representation learning, and standard autoregressive language modeling. Crucially, the method requires no architectural modification—capabilities are decoupled and co-trained solely via attention masking and loss design. Experiments demonstrate that the model surpasses strong encoder-based baselines (e.g., BERT) on representation tasks, enables high-quality, repetition-free zero-shot generation, supports context-sensitive infilling, and fully preserves pretrained knowledge.

Technology Category

Application Category

📝 Abstract
While originally designed for unidirectional generative modeling, decoder-only large language models (LLMs) are increasingly being adapted for bidirectional modeling. However, unidirectional and bidirectional models are typically trained separately with distinct objectives (generation and representation learning, respectively). This separation overlooks the opportunity for developing a more versatile language model and for these objectives to complement each other. In this work, we introduce MAGNET, an adaptation of decoder-only LLMs that enhances their ability to generate robust representations and infill missing text spans, while preserving their knowledge and text generation capabilities. MAGNET employs three self-supervised training objectives and introduces an attention mechanism that combines bidirectional and causal attention, enabling unified training across all objectives. Our results demonstrate that LLMs adapted with MAGNET (1) surpass strong text encoders on token-level and sentence-level representation learning tasks, (2) generate contextually appropriate text infills by leveraging future context, (3) retain the ability for open-ended text generation without exhibiting repetition problem, and (4) preserve the knowledge gained by the LLM during pretraining.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Unidirectional Generation
Bidirectional Modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

MAGNET
Self-supervised Training
Unified Attention Mechanism
🔎 Similar Papers
No similar papers found.