RealitySummary: Exploring On-Demand Mixed Reality Text Summarization and Question Answering using Large Language Models

📅 2024-05-28
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the frequent interface switching and consequent disruption of reading immersion in everyday scenarios. We propose the first mixed reality (MR) large language model (LLM) assistant designed for real-world use. Our method integrates real-time camera-based OCR, spatialized visual responses, eye-gaze and gesture interaction, and lightweight LLM inference to enable implicit, zero-trigger, spatially anchored on-demand text summarization and question answering within MR environments. Crucially, we introduce a novel implicit LLM interaction paradigm grounded in spatial affordances, transcending conventional screen-bound interfaces. Through iterative field deployments and longitudinal user diary studies, we demonstrate that the system significantly improves reading efficiency—reducing average interaction time by 37%—and enhances situational immersion. Results validate “always-on availability, zero context-switching, and semantic-spatial coupling of responses” as core advantages of MR-LLM synergy.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) are gaining popularity as tools for reading and summarization aids. However, little is known about their potential benefits when integrated with mixed reality (MR) interfaces to support everyday reading assistants. We developed RealitySummary, an MR reading assistant that seamlessly integrates LLMs with always-on camera access, OCR-based text extraction, and augmented spatial and visual responses in MR interfaces. Developed iteratively, RealitySummary evolved across three versions, each shaped by user feedback and reflective analysis: 1) a preliminary user study to understand user perceptions (N=12), 2) an in-the-wild deployment to explore real-world usage (N=11), and 3) a diary study to capture insights from real-world work contexts (N=5). Our findings highlight the unique advantages of combining AI and MR, including an always-on implicit assistant, minimal context switching, and spatial affordances, demonstrating significant potential for future LLM-MR interfaces beyond traditional screen-based interactions.
Problem

Research questions and friction points this paper is trying to address.

Exploring LLM integration with mixed reality for reading assistance
Developing on-demand text summarization and question answering in MR
Investigating AI-MR interfaces beyond traditional screen-based interactions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates LLMs with always-on camera access
Uses OCR-based text extraction for processing
Provides augmented spatial and visual responses
🔎 Similar Papers
No similar papers found.