SafeRun: Enabling Determinism in LLM Planning for Running

📅 2026-06-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the safety risks posed by the probabilistic nature of large language models (LLMs) in safety-critical applications such as running plan generation, where rule violations can lead to hazardous outcomes. To mitigate this issue, the authors propose SafeRun, a novel framework that decouples soft semantic understanding from hard safety constraints for the first time. SafeRun leverages an LLM to interpret user instructions while delegating plan execution to a deterministic solver that enforces physiological and safety constraints. Evaluated across five mainstream LLMs on a newly curated running planning benchmark, SafeRun achieves 100% safety compliance—substantially outperforming Prompt Engineering (79.1%) and CodeAct (97.6%)—while preserving strong instruction-following capabilities.
📝 Abstract
Large Language Models enable flexible natural-language planning but remain unreliable in determinism-critical domains due to their probabilistic nature. This limitation is especially problematic in running planning, where violating safety rules can lead to safety risks. We propose SafeRun, a framework for deterministic LLM-based planning via a decoupled architecture. SafeRun separates soft interpretation by an LLM from hard constraint enforcement by a deterministic solver, ensuring strict safety constraints while preserving natural-language flexibility. To validate SafeRun, we build a comprehensive benchmark for running planning under realistic physiological and safety constraints. Experiments across five LLMs show that SafeRun achieves 100\% safety score (vs.\ 79.1\% PE average and 97.6\% CodeAct average) while maintaining competitive instruction-following scores. The SafeRun benchmark is publicly available at \href{https://huggingface.co/datasets/zzp-seeker/SafeRun-RunPlanning-Benchmark}{huggingface}.
Problem

Research questions and friction points this paper is trying to address.

determinism
LLM planning
safety constraints
running planning
reliability
Innovation

Methods, ideas, or system contributions that make the work stand out.

deterministic planning
LLM-based planning
safety constraints
decoupled architecture
running planning