Fine-Tune an SLM or Prompt an LLM? The Case of Generating Low-Code Workflows

📅 2025-05-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses structured output generation—specifically, synthesizing low-code workflow definitions in JSON format—a task demanding strict syntactic validity, semantic fidelity, and logical consistency. Method: We systematically compare fine-tuning small language models (SLMs) against prompting large language models (LLMs), proposing an SLM optimization framework integrating supervised fine-tuning, structured prompt engineering, and JSON-constrained decoding. We further design a hybrid human-automated evaluation framework for rigorous assessment. Contribution/Results: Experiments demonstrate that fine-tuned SLMs outperform prompted LLMs by 10% on average in structural quality (accuracy, field completeness, logical consistency), achieve 3.2× faster inference, and reduce per-invocation cost by 76%. Fine-grained error analysis identifies persistent bottlenecks in nested structure generation and adherence to semantic constraints. This study establishes a reproducible, cost-effective technical pathway for high-quality structured generation under resource constraints.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) such as GPT-4o can handle a wide range of complex tasks with the right prompt. As per token costs are reduced, the advantages of fine-tuning Small Language Models (SLMs) for real-world applications -- faster inference, lower costs -- may no longer be clear. In this work, we present evidence that, for domain-specific tasks that require structured outputs, SLMs still have a quality advantage. We compare fine-tuning an SLM against prompting LLMs on the task of generating low-code workflows in JSON form. We observe that while a good prompt can yield reasonable results, fine-tuning improves quality by 10% on average. We also perform systematic error analysis to reveal model limitations.
Problem

Research questions and friction points this paper is trying to address.

Compare fine-tuning SLMs vs prompting LLMs for structured outputs
Evaluate quality of generating low-code workflows in JSON
Analyze model limitations through systematic error assessment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tune SLMs for structured outputs
Compare SLM fine-tuning vs LLM prompting
SLMs improve quality by 10%
🔎 Similar Papers
No similar papers found.