🤖 AI Summary
Large language models (LLMs) frequently generate unstructured outputs that deviate from predefined schemas in critical tasks such as agent reasoning and information extraction, hindering their reliable deployment. To address this, we propose SLOT, a model-agnostic post-processing framework that enables general-purpose structured conversion via lightweight fine-tuning of small language models (e.g., Mistral-7B, Llama-3.2-1B). SLOT integrates synthetic structured data generation, constraint-based decoding, and multi-dimensional evaluation—jointly optimizing schema compliance and semantic fidelity. This work introduces the first systematic benchmark explicitly designed to assess both structural correctness (schema accuracy) and content preservation (semantic similarity). Experiments show that Mistral-7B+SLOT achieves 99.5% schema accuracy and 94.0% semantic similarity, outperforming Claude-3.5-Sonnet by +25 and +20 percentage points, respectively. Moreover, Llama-3.2-1B+SLOT matches the structured output quality of significantly larger proprietary models, demonstrating strong cross-model and cross-schema adaptability.
📝 Abstract
Structured outputs are essential for large language models (LLMs) in critical applications like agents and information extraction. Despite their capabilities, LLMs often generate outputs that deviate from predefined schemas, significantly hampering reliable application development. We present SLOT (Structured LLM Output Transformer), a model-agnostic approach that transforms unstructured LLM outputs into precise structured formats. While existing solutions predominantly rely on constrained decoding techniques or are tightly coupled with specific models, SLOT employs a fine-tuned lightweight language model as a post-processing layer, achieving flexibility across various LLMs and schema specifications. We introduce a systematic pipeline for data curation and synthesis alongside a formal evaluation methodology that quantifies both schema accuracy and content fidelity. Our results demonstrate that fine-tuned Mistral-7B model with constrained decoding achieves near perfect schema accuracy (99.5%) and content similarity (94.0%), outperforming Claude-3.5-Sonnet by substantial margins (+25 and +20 percentage points, respectively). Notably, even compact models like Llama-3.2-1B can match or exceed the structured output capabilities of much larger proprietary models when equipped with SLOT, enabling reliable structured generation in resource-constrained environments.