🤖 AI Summary
While generative large language models (LLMs) excel at open-ended text generation, they underperform significantly—relative to similarly sized encoder-only models—on structured prediction tasks such as named entity recognition and relation extraction, primarily due to misalignment between their internal linguistic representations and the supervised fine-tuning output space.
Method: We propose a dataset-agnostic, general-purpose structured prediction framework that systematically reformulates sequence-to-sequence modeling as a classification task, integrating loss calibration and structured decoding to unify generative language modeling with classification-based sequence modeling.
Contribution/Results: Our approach matches or approaches the performance of task-specific fine-tuning across diverse structured prediction benchmarks, while substantially improving out-of-distribution generalization robustness. It eliminates the need for dataset-specific architectural or training modifications, offering a viable, unified alternative to conventional custom fine-tuning paradigms.
📝 Abstract
Previous work in structured prediction (e.g. NER, information extraction) using single model make use of explicit dataset information, which helps boost in-distribution performance but is orthogonal to robust generalization in real-world situations. To overcome this limitation, we propose the Structured Language Generation Model (SLGM), a framework that reduces sequence-to-sequence problems to classification problems via methodologies in loss calibration and decoding method. Our experimental results show that SLGM is able to maintain performance without explicit dataset information, follow and potentially replace dataset-specific fine-tuning.