RESTL: Reinforcement Learning Guided by Multi-Aspect Rewards for Signal Temporal Logic Transformation

📅 2025-11-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses key challenges—atomic proposition errors, semantic distortion, and formula redundancy—in automated natural language-to-Signal Temporal Logic (STL) translation. We propose a multi-dimensional reward-guided reinforcement learning framework built upon a large language model (LLM) backbone and optimized end-to-end via the Proximal Policy Optimization (PPO) algorithm. Four specialized reward models are introduced to quantitatively assess atomic proposition consistency, semantic alignment, formula conciseness, and symbolic matching accuracy; a curriculum learning strategy further enhances reward signal quality. Experimental results demonstrate that our method significantly outperforms existing state-of-the-art approaches on both automatic metrics and human evaluation, yielding substantial improvements in STL formula accuracy, semantic fidelity, and readability.

Technology Category

Application Category

📝 Abstract
Signal Temporal Logic (STL) is a powerful formal language for specifying real-time specifications of Cyber-Physical Systems (CPS). Transforming specifications written in natural language into STL formulas automatically has attracted increasing attention. Existing rule-based methods depend heavily on rigid pattern matching and domain-specific knowledge, limiting their generalizability and scalability. Recently, Supervised Fine-Tuning (SFT) of large language models (LLMs) has been successfully applied to transform natural language into STL. However, the lack of fine-grained supervision on atomic proposition correctness, semantic fidelity, and formula readability often leads SFT-based methods to produce formulas misaligned with the intended meaning. To address these issues, we propose RESTL, a reinforcement learning (RL)-based framework for the transformation from natural language to STL. RESTL introduces multiple independently trained reward models that provide fine-grained, multi-faceted feedback from four perspectives, i.e., atomic proposition consistency, semantic alignment, formula succinctness, and symbol matching. These reward models are trained with a curriculum learning strategy to improve their feedback accuracy, and their outputs are aggregated into a unified signal that guides the optimization of the STL generator via Proximal Policy Optimization (PPO). Experimental results demonstrate that RESTL significantly outperforms state-of-the-art methods in both automatic metrics and human evaluations.
Problem

Research questions and friction points this paper is trying to address.

Automating natural language to Signal Temporal Logic conversion
Addressing limitations of rule-based and supervised methods
Improving formula correctness, semantic fidelity and readability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning framework with multi-aspect reward models
Curriculum learning strategy for improving reward accuracy
Proximal Policy Optimization for STL generator training
🔎 Similar Papers
No similar papers found.
Y
Yue Fang
School of Computer Science, Peking University, Beijing, China; Key Laboratory of High Confidence Software Technologies (PKU), MOE, China
Zhi Jin
Zhi Jin
Sun Yat-Sen University, Associate Professor
J
Jie An
National Key Laboratory of Space Integrated Information System; Institute of Software, Chinese Academy of Sciences, Beijing, China
Hongshen Chen
Hongshen Chen
JD.com, Beijing, China
X
Xiaohong Chen
East China Normal University, Shanghai, China
Naijun Zhan
Naijun Zhan
School of Computer Science, Peking University
Formal MethodsReal-timeembedded and hybrid systemsProgram Verification