CapTune: Adapting Non-Speech Captions With Anchored Generative Models

📅 2025-08-27

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Non-speech captions (NSCs) are critical for Deaf and hard-of-hearing (DHH) audiences, yet existing approaches struggle to reconcile diverse user preferences with authorial intent. This paper introduces an *anchored generative modeling framework* that defines a safe transformation space via exemplars, enabling customizable NSC generation across four dimensions: granularity, expressivity, sound representation style, and genre alignment. Integrating generative modeling with human-centered interaction design, the framework allows caption authors to specify transformation boundaries while empowering users to adjust stylistic parameters within those constraints. We identify and formalize trade-offs among information density, cognitive load, and expressive tension. Evaluation with seven professional caption authors and twelve DHH participants demonstrates significant improvements in emotional engagement without compromising authorial consistency—validating the dual feasibility of personalization in both user experience and intent preservation.

Technology Category

Application Category

📝 Abstract

Non-speech captions are essential to the video experience of deaf and hard of hearing (DHH) viewers, yet conventional approaches often overlook the diversity of their preferences. We present CapTune, a system that enables customization of non-speech captions based on DHH viewers' needs while preserving creator intent. CapTune allows caption authors to define safe transformation spaces using concrete examples and empowers viewers to personalize captions across four dimensions: level of detail, expressiveness, sound representation method, and genre alignment. Evaluations with seven caption creators and twelve DHH participants showed that CapTune supported creators' creative control while enhancing viewers' emotional engagement with content. Our findings also reveal trade-offs between information richness and cognitive load, tensions between interpretive and descriptive representations of sound, and the context-dependent nature of caption preferences.

Problem

Research questions and friction points this paper is trying to address.

Customizing non-speech captions for DHH viewers' preferences

Balancing creator intent with viewer personalization needs

Addressing trade-offs between information richness and cognitive load

Innovation

Methods, ideas, or system contributions that make the work stand out.

Customizes captions using anchored generative models

Defines safe transformation spaces with examples

Personalizes captions across four key dimensions

🔎 Similar Papers

No similar papers found.

Authors to Follow