π€ AI Summary
This work addresses the challenge of automatically and reliably translating natural language (NL) instructions into Linear Temporal Logic (LTL) formulas, thereby lowering the expertise barrier for manually encoding LTL specifications in robotic task planning. We propose the first iterative NL2LTL framework integrating conformal prediction (CP) with large language models (LLMs), enabling users to specify a target translation success rate (e.g., 90%) and dynamically balance full automation against human-in-the-loop verification. Leveraging open-vocabulary question-answering and distribution-free uncertainty quantification, our method provides theoretically grounded, confidence-controllable correctness guarantees. Empirical evaluation demonstrates significantly lower human assistance request rates compared to baselines, while maintaining strong generalization across unseen domains and instruction patterns. The core contribution is the novel application of conformal prediction to NL2LTLβenabling verifiable, confidence-calibrated semantic translation with adjustable reliability.
π Abstract
Linear Temporal Logic (LTL) has become a prevalent specification language for robotic tasks. To mitigate the significant manual effort and expertise required to define LTL-encoded tasks, several methods have been proposed for translating Natural Language (NL) instructions into LTL formulas, which, however, lack correctness guarantees. To address this, we introduce a new NL-to-LTL translation method, called ConformalNL2LTL, that can achieve user-defined translation success rates over unseen NL commands. Our method constructs LTL formulas iteratively by addressing a sequence of open-vocabulary Question-Answering (QA) problems with LLMs. To enable uncertainty-aware translation, we leverage conformal prediction (CP), a distribution-free uncertainty quantification tool for black-box models. CP enables our method to assess the uncertainty in LLM-generated answers, allowing it to proceed with translation when sufficiently confident and request help otherwise. We provide both theoretical and empirical results demonstrating that ConformalNL2LTL achieves user-specified translation accuracy while minimizing help rates.