RAS: Reflection-Augmented Scaling with In-Context Learning for Executable Cypher Query Generation

📅 2026-05-21

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

This work addresses the challenge of generating non-executable queries in Text2Cypher tasks by proposing Reflection-Augmented Scaling (RAS), a novel approach that systematically leverages syntax errors returned by the database as feedback signals during inference. RAS dynamically guides large language models to refine subsequent query generations through in-context learning, replacing memoryless independent re-sampling. Integrating code-specialized large language models with a runtime computation allocation strategy, RAS demonstrates consistent effectiveness across three Neo4j datasets and five model variants. With a sampling budget of n=5, RAS reduces query execution error rates by 41–50%, substantially outperforming conventional independent scaling methods (32–38%) and significantly enhancing both the executability of generated queries and inference efficiency.

📝 Abstract

Inference-time scaling can reduce errors in structured query generation, but methods to allocate the compute for query code generation remains underexplored. We study Text2Cypher, where language models generate Cypher queries that execute against property graph databases. Non-executable queries constitute a distinct syntactic failure separate from semantic inaccuracy: a syntax error triggers a system-generated error message from the database. These error messages are typically discarded at inference time rather than leveraged through in-context learning (ICL). We compare two inference methods: Independent Scaling (IS), which performs memoryless resampling, and Reflection-Augmented Scaling (RAS), which conditions each new attempt on prior execution feedback via ICL. Across three Neo4j datasets and five code-specialized language models, RAS reduces the Query Execution Error Rate by 41--50% at n{=}5, outperforming IS at 32--38%. Execution errors are not merely failures to discard but actionable feedback, and structuring inference-time compute around them is a more efficient path to executability than scaling independent samples.

Problem

Research questions and friction points this paper is trying to address.

Text2Cypher

executable query generation

inference-time scaling

execution error

in-context learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reflection-Augmented Scaling

In-Context Learning

Text2Cypher