FRASE: Structured Representations for Generalizable SPARQL Query Generation

📅 2025-03-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing SPARQL generation datasets rely heavily on syntactic templates, causing models to learn only shallow, surface-level mappings between natural language questions and SPARQL queries—resulting in poor generalization to paraphrased or template-unseen inputs. To address this, we propose FRASE, a semantic enhancement framework that introduces Frame Semantic Role Labeling (FSRL) to SPARQL generation for the first time. We construct LC-QuAD 3.0, the first frame-augmented dataset, enabling deep semantic alignment between questions and queries via frame detection, argument mapping, and LLM fine-tuning. Our approach significantly improves robustness: SPARQL exact-match accuracy increases consistently by 12.7–18.3% on unseen templates and natural paraphrases. This demonstrates that structured semantic representations—grounded in linguistic frames—are critical for enhancing generalization in semantic parsing tasks.

Technology Category

Application Category

📝 Abstract
Translating natural language questions into SPARQL queries enables Knowledge Base querying for factual and up-to-date responses. However, existing datasets for this task are predominantly template-based, leading models to learn superficial mappings between question and query templates rather than developing true generalization capabilities. As a result, models struggle when encountering naturally phrased, template-free questions. This paper introduces FRASE (FRAme-based Semantic Enhancement), a novel approach that leverages Frame Semantic Role Labeling (FSRL) to address this limitation. We also present LC-QuAD 3.0, a new dataset derived from LC-QuAD 2.0, in which each question is enriched using FRASE through frame detection and the mapping of frame-elements to their argument. We evaluate the impact of this approach through extensive experiments on recent large language models (LLMs) under different fine-tuning configurations. Our results demonstrate that integrating frame-based structured representations consistently improves SPARQL generation performance, particularly in challenging generalization scenarios when test questions feature unseen templates (unknown template splits) and when they are all naturally phrased (reformulated questions).
Problem

Research questions and friction points this paper is trying to address.

Improves SPARQL query generation from natural language questions
Addresses generalization issues in template-based datasets
Enhances performance with frame-based structured representations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Frame Semantic Role Labeling (FSRL)
Enhances dataset with frame-element mappings
Improves SPARQL generation via structured representations