How do Scaling Laws Apply to Knowledge Graph Engineering Tasks? The Impact of Model Size on Large Language Model Performance

📅 2025-05-22

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This study investigates the applicability and boundaries of large language model (LLM) scaling laws in knowledge graph engineering (KGE) tasks. Using the LLM-KG-Bench benchmark, we systematically evaluate 26 open-source LLMs across three KGE task categories—knowledge graph understanding, generation, and querying—analyzing performance trends with respect to parameter count. We identify, for the first time, localized scaling law breakdowns in KGE: pronounced performance plateaus and intra-family inverse scaling (i.e., larger models underperform smaller ones within the same architecture family). Through cross-model horizontal evaluation and intra-family longitudinal analysis, we confirm that scaling laws hold broadly but non-monotonically across most tasks. Building on these findings, we propose a cost-aware model selection strategy that jointly optimizes performance and inference efficiency, identifying several high-performing, resource-efficient candidates among small- to medium-scale models (≤7B parameters).

Technology Category

Application Category

📝 Abstract

When using Large Language Models (LLMs) to support Knowledge Graph Engineering (KGE), one of the first indications when searching for an appropriate model is its size. According to the scaling laws, larger models typically show higher capabilities. However, in practice, resource costs are also an important factor and thus it makes sense to consider the ratio between model performance and costs. The LLM-KG-Bench framework enables the comparison of LLMs in the context of KGE tasks and assesses their capabilities of understanding and producing KGs and KG queries. Based on a dataset created in an LLM-KG-Bench run covering 26 open state-of-the-art LLMs, we explore the model size scaling laws specific to KGE tasks. In our analyses, we assess how benchmark scores evolve between different model size categories. Additionally, we inspect how the general score development of single models and families of models correlates to their size. Our analyses revealed that, with a few exceptions, the model size scaling laws generally also apply to the selected KGE tasks. However, in some cases, plateau or ceiling effects occurred, i.e., the task performance did not change much between a model and the next larger model. In these cases, smaller models could be considered to achieve high cost-effectiveness. Regarding models of the same family, sometimes larger models performed worse than smaller models of the same family. These effects occurred only locally. Hence it is advisable to additionally test the next smallest and largest model of the same family.

Problem

Research questions and friction points this paper is trying to address.

Investigates how model size affects LLM performance in Knowledge Graph Engineering tasks

Evaluates cost-effectiveness of LLMs by comparing performance-to-cost ratios in KGE

Analyzes scaling law exceptions where larger models plateau or underperform smaller ones

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-KG-Bench framework compares LLMs for KGE tasks

Analyzes model size scaling laws in KGE tasks

Identifies cost-effective smaller models with plateau effects

🔎 Similar Papers

Fine-Grained Stateful Knowledge Exploration: A Novel Paradigm for Integrating Knowledge Graphs with Large Language Models