Guided Reality: Generating Visually-Enriched AR Task Guidance with LLMs and Vision Models

📅 2025-08-05

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Current LLM-driven AR task guidance systems exhibit weak visual augmentation capabilities, hindering effective spatial anchoring of natural language instructions. To address this, we propose the first end-to-end AR task guidance generation framework that jointly leverages a large language model for multi-step instruction parsing, a vision model for key interaction point localization, and spatially aware rendering to achieve precise registration and adaptive display of dynamic visual cues—including highlighting, arrows, labels, and four other cue types—in real-world scenes. Our novel “step-aware visual cue identification strategy” enables fully automated, text-to-spatial-augmentation mapping for the first time. A user study (N=16) demonstrates significant improvements in task completion accuracy (+28.6%) and operational efficiency. Furthermore, four domain-expert trainers independently validated the framework’s potential for seamless integration into instructional workflows.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have enabled the automatic generation of step-by-step augmented reality (AR) instructions for a wide range of physical tasks. However, existing LLM-based AR guidance often lacks rich visual augmentations to effectively embed instructions into spatial context for a better user understanding. We present Guided Reality, a fully automated AR system that generates embedded and dynamic visual guidance based on step-by-step instructions. Our system integrates LLMs and vision models to: 1) generate multi-step instructions from user queries, 2) identify appropriate types of visual guidance, 3) extract spatial information about key interaction points in the real world, and 4) embed visual guidance in physical space to support task execution. Drawing from a corpus of user manuals, we define five categories of visual guidance and propose an identification strategy based on the current step. We evaluate the system through a user study (N=16), completing real-world tasks and exploring the system in the wild. Additionally, four instructors shared insights on how Guided Reality could be integrated into their training workflows.

Problem

Research questions and friction points this paper is trying to address.

Generates AR task guidance lacking visual enrichments

Identifies visual guidance types for better spatial context

Embeds dynamic visual aids in physical task execution

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates LLMs and vision models for AR guidance

Automatically identifies and embeds visual guidance types

Extracts spatial info for real-world interaction points

🔎 Similar Papers

No similar papers found.

Authors to Follow