🤖 AI Summary
To address the limited accuracy of general-purpose large language models (LLMs) in domain-specific modeling, this paper proposes a lightweight, fine-tuning-free optimization framework tailored for Llama 3.1 to enhance its capability in automatically generating domain models—particularly medical data models—from natural language descriptions. Methodologically, the framework integrates search-based hyperparameter tuning with structured prompt engineering: it systematically optimizes inference parameters (e.g., temperature, top-p, max_tokens) and employs stepwise, domain-aware prompt templates to improve output controllability and semantic consistency. By avoiding parameter updates, the approach eliminates the computational overhead and catastrophic forgetting associated with fine-tuning. Experiments across ten heterogeneous domains demonstrate substantial improvements, especially in healthcare (+23.6% F1 over baselines), alongside robust cross-domain generalization—validating both effectiveness and practical applicability.
📝 Abstract
The introduction of large language models (LLMs) has enhanced automation in software engineering tasks, including in Model Driven Engineering (MDE). However, using general-purpose LLMs for domain modeling has its limitations. One approach is to adopt fine-tuned models, but this requires significant computational resources and can lead to issues like catastrophic forgetting.
This paper explores how hyperparameter tuning and prompt engineering can improve the accuracy of the Llama 3.1 model for generating domain models from textual descriptions. We use search-based methods to tune hyperparameters for a specific medical data model, resulting in a notable quality improvement over the baseline LLM. We then test the optimized hyperparameters across ten diverse application domains.
While the solutions were not universally applicable, we demonstrate that combining hyperparameter tuning with prompt engineering can enhance results across nearly all examined domain models.