The DeepSpeak-Agentic Dataset

📅 2026-06-02
📈 Citations: 0
Influential: 0
📄 PDF

career value

240K/year
🤖 AI Summary
This study addresses the lack of high-quality multimodal dialogue datasets for recognizing and analyzing interactions between embodied AI agents and humans. To bridge this gap, the authors introduce DeepSpeak-Agentic, a novel dataset comprising over 37 hours of synchronized audiovisual recordings from semi-structured human–agent conversations, establishing the first multimodal dialogue benchmark specifically designed for embodied AI agents. The work presents a scalable data collection framework that integrates AI agent deployment, crowdworker pairing, automated recording, and signal separation techniques to distinguish human and agent contributions. DeepSpeak-Agentic provides a public, standardized evaluation foundation for research in AI-generated speech and facial animation, human–agent interaction modeling, and automated forensic identification tasks.
📝 Abstract
We present DeepSpeak-Agentic, a dataset of videos comprising over 37 hours of semi-structured conversations between a human and an embodied AI agent. We use this dataset to evaluate the automatic forensic identification (audio, video, or text) of AI agents, study the nature of human-agent interactions, and provide a benchmark for future advances in the large-language models and AI-generated voices and faces that power embodied AI agents. We also contribute a scalable data-capture system that creates agents, automatically pairs them with human crowd workers, records audiovisual conversations across specified scenarios, and identifies and separates the human and agent in the combined stream.
Problem

Research questions and friction points this paper is trying to address.

embodied AI agents
automatic forensic identification
human-agent interaction
AI-generated voices
AI-generated faces
Innovation

Methods, ideas, or system contributions that make the work stand out.

embodied AI agent
automatic forensic identification
scalable data-capture system
AI-generated voices and faces
human-agent interaction
🔎 Similar Papers
No similar papers found.