PerceptTwin: Semantic Scene Reconstruction for Iterative LLM Planning and Verification

📅 2026-06-02

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This work addresses the challenge of constructing interactive simulation environments for robotic planning, which is traditionally labor-intensive and hinders the safe and reliable deployment of large language models (LLMs). The authors propose the first fully automated pipeline that generates interactive simulators directly from robotic perception outputs by leveraging open-vocabulary semantic maps, integrating 3D asset generation, functional affordance prediction, and commonsense precondition validation. A novel LLM-based referee mechanism is introduced to evaluate the alignment between generated plans and human preferences. Experimental results demonstrate that the approach improves the average task success rate of GPT-5-based planners by 39% across diverse scenarios and enhances human evaluators’ efficiency in identifying failed plans caused by missing skill preconditions by 18%.

📝 Abstract

Simulation environments are useful for both robot policy learning and planning verification and validation. Traditionally, the process of creating a simulation was onerous. Creating a bespoke simulation environment for each individual environment that a robot would operate in was simply infeasible. In this work, we introduce PerceptTwin, a fully automatic pipeline that constructs interactive simulations directly from semantic scene representations produced by a robot's perception stack. PerceptTwin combines open-vocabulary object maps with 3D asset generation, affordance prediction, and commonsense condition checking. These interactive simulations can be used to validate and refine plans before they are executed on the robot hardware. Borrowing from the AI alignment literature, we also introduce an LLM judge that verifies plan correctness and alignment with human preferences. Experiments show that PerceptTwin feedback allows LLM planners to refine plans, enhance safety, and resist harmful black-box prompting attacks. In our suite of tasks, PerceptTwin improves plan success by an average of approximately 39% for GPT5, GPT5Mini, and GPT5Nano planners. Additionally, PerceptTwin also improves human plan verification by up to 18% on average for plans that fail due to unfilled skill preconditions. Our results demonstrate the potential of open-vocabulary scene simulation from robot perception as a foundation for safer, more reliable robot planning.

Problem

Research questions and friction points this paper is trying to address.

semantic scene reconstruction

interactive simulation

LLM planning

plan verification

robot perception

Innovation

Methods, ideas, or system contributions that make the work stand out.

PerceptTwin

semantic scene reconstruction

interactive simulation