RESTL: Reinforcement Learning Guided by Multi-Aspect Rewards for Signal Temporal Logic Transformation

📅 2025-11-11

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

This work addresses key challenges—atomic proposition errors, semantic distortion, and formula redundancy—in automated natural language-to-Signal Temporal Logic (STL) translation. We propose a multi-dimensional reward-guided reinforcement learning framework built upon a large language model (LLM) backbone and optimized end-to-end via the Proximal Policy Optimization (PPO) algorithm. Four specialized reward models are introduced to quantitatively assess atomic proposition consistency, semantic alignment, formula conciseness, and symbolic matching accuracy; a curriculum learning strategy further enhances reward signal quality. Experimental results demonstrate that our method significantly outperforms existing state-of-the-art approaches on both automatic metrics and human evaluation, yielding substantial improvements in STL formula accuracy, semantic fidelity, and readability.

Technology Category

Application Category

📝 Abstract

Signal Temporal Logic (STL) is a powerful formal language for specifying real-time specifications of Cyber-Physical Systems (CPS). Transforming specifications written in natural language into STL formulas automatically has attracted increasing attention. Existing rule-based methods depend heavily on rigid pattern matching and domain-specific knowledge, limiting their generalizability and scalability. Recently, Supervised Fine-Tuning (SFT) of large language models (LLMs) has been successfully applied to transform natural language into STL. However, the lack of fine-grained supervision on atomic proposition correctness, semantic fidelity, and formula readability often leads SFT-based methods to produce formulas misaligned with the intended meaning. To address these issues, we propose RESTL, a reinforcement learning (RL)-based framework for the transformation from natural language to STL. RESTL introduces multiple independently trained reward models that provide fine-grained, multi-faceted feedback from four perspectives, i.e., atomic proposition consistency, semantic alignment, formula succinctness, and symbol matching. These reward models are trained with a curriculum learning strategy to improve their feedback accuracy, and their outputs are aggregated into a unified signal that guides the optimization of the STL generator via Proximal Policy Optimization (PPO). Experimental results demonstrate that RESTL significantly outperforms state-of-the-art methods in both automatic metrics and human evaluations.

Problem

Research questions and friction points this paper is trying to address.

Automating natural language to Signal Temporal Logic conversion

Addressing limitations of rule-based and supervised methods

Improving formula correctness, semantic fidelity and readability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning framework with multi-aspect reward models

Curriculum learning strategy for improving reward accuracy

Proximal Policy Optimization for STL generator training

🔎 Similar Papers

Temporal Logic Specification-Conditioned Decision Transformer for Offline Safe Reinforcement Learning