Enhancing Transformation from Natural Language to Signal Temporal Logic Using LLMs with Diverse External Knowledge

📅 2025-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address data scarcity, semantic inaccuracy, and insufficient pattern coverage in natural language (NL) to Signal Temporal Logic (STL) translation, this paper introduces STL-DivEn—the first high-diversity NL–STL dataset comprising 16K annotated samples—and proposes KGST, a knowledge-guided generation-refinement framework. Methodologically, KGST integrates STL’s formal semantics and external domain knowledge into both generation and refinement stages. Its contributions include: (i) a novel data construction paradigm combining clustering-based exemplar selection, LLM-assisted collaborative generation, rule-based filtering, and human verification; and (ii) explicit incorporation of STL semantics and external knowledge to enhance semantic fidelity and logical completeness. Experiments demonstrate that STL-DivEn substantially outperforms existing benchmarks. KGST achieves state-of-the-art accuracy on both STL-DivEn and DeepSTL, validated rigorously via automated metrics and human evaluation.

Technology Category

Application Category

📝 Abstract
Temporal Logic (TL), especially Signal Temporal Logic (STL), enables precise formal specification, making it widely used in cyber-physical systems such as autonomous driving and robotics. Automatically transforming NL into STL is an attractive approach to overcome the limitations of manual transformation, which is time-consuming and error-prone. However, due to the lack of datasets, automatic transformation currently faces significant challenges and has not been fully explored. In this paper, we propose an NL-STL dataset named STL-Diversity-Enhanced (STL-DivEn), which comprises 16,000 samples enriched with diverse patterns. To develop the dataset, we first manually create a small-scale seed set of NL-STL pairs. Next, representative examples are identified through clustering and used to guide large language models (LLMs) in generating additional NL-STL pairs. Finally, diversity and accuracy are ensured through rigorous rule-based filters and human validation. Furthermore, we introduce the Knowledge-Guided STL Transformation (KGST) framework, a novel approach for transforming natural language into STL, involving a generate-then-refine process based on external knowledge. Statistical analysis shows that the STL-DivEn dataset exhibits more diversity than the existing NL-STL dataset. Moreover, both metric-based and human evaluations indicate that our KGST approach outperforms baseline models in transformation accuracy on STL-DivEn and DeepSTL datasets.
Problem

Research questions and friction points this paper is trying to address.

Automating NL-to-STL conversion to reduce manual effort and errors
Addressing dataset scarcity for NL-STL transformation research
Improving transformation accuracy using knowledge-guided LLM frameworks
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs generate NL-STL pairs with clustering guidance
Rule-based filters ensure dataset diversity and accuracy
Knowledge-guided framework refines NL to STL transformation
🔎 Similar Papers
No similar papers found.
Y
Yue Fang
Peking University, Beijing, China
Zhi Jin
Zhi Jin
Sun Yat-Sen University, Associate Professor
J
Jie An
Institute of Software, Chinese Academy of Sciences, Beijing, China
Hongshen Chen
Hongshen Chen
JD.com, Beijing, China
X
Xiaohong Chen
East China Normal University, Shanghai, China
Naijun Zhan
Naijun Zhan
School of Computer Science, Peking University
Formal MethodsReal-timeembedded and hybrid systemsProgram Verification