Revisiting put-that-there, context aware window interactions via LLMs

📅 2025-11-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the unnatural, explicit-window-management interactions in XR headsets. We propose a novel “Put-That-There” paradigm integrating large language models (LLMs) with multimodal XR sensing. Our method jointly leverages semantic-segmentation-driven 3D environment reconstruction, real-time application metadata, speech commands, pointing gestures, and eye-tracking data; an LLM performs goal-directed intent understanding and one-to-many operation mapping to dynamically infer application invocation, window placement, and cross-tool spatial layout relationships, outputting structured JSON control instructions. Contributions include: (1) the first deep integration of LLMs into the spatial interaction feedback loop, enabling end-to-end reasoning from high-level semantic goals (e.g., “place the email window on the desk directly in front of me”) to physical-space actions; and (2) significant improvements in naturalness, intent consistency, and cross-application coordination efficiency for window management within panoramic workspaces.

Technology Category

Application Category

📝 Abstract
We revisit Bolt's classic"Put-That-There"concept for modern head-mounted displays by pairing Large Language Models (LLMs) with XR sensor and tech stack. The agent fuses (i) a semantically segmented 3-D environment, (ii) live application metadata, and (iii) users'verbal, pointing, and head-gaze cues to issue JSON window-placement actions. As a result, users can manage a panoramic workspace through: (1) explicit commands ("Place Google Maps on the coffee table"), (2) deictic speech plus gestures ("Put that there"), or (3) high-level goals ("I need to send a message"). Unlike traditional explicit interfaces, our system supports one-to-many action mappings and goal-centric reasoning, allowing the LLM to dynamically infer relevant applications and layout decisions, including interrelationships across tools. This enables seamless, intent-driven interaction without manual window juggling in immersive XR environments.
Problem

Research questions and friction points this paper is trying to address.

Enabling natural window management in XR using verbal and gestural commands
Replacing manual window manipulation with intent-driven LLM inference
Supporting goal-oriented interactions without traditional explicit interfaces
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs fused with XR sensors for window placement
Semantic 3D environment and user cues drive JSON actions
Goal-centric reasoning enables dynamic application inference
🔎 Similar Papers
No similar papers found.