Can ChatGPT Learn My Life From a Week of First-Person Video?

📅 2025-04-04

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study investigates whether foundation models can end-to-end learn implicit, individual life traits from long-duration first-person video. We collected 54 hours of real-world egocentric footage via wearable cameras and constructed hierarchical summaries—spanning minute-, hour-, and day-level granularities—to supervise fine-tuning of GPT-4o and GPT-4o-mini for personalized modeling. To our knowledge, this is the first work integrating long-term first-person video with hierarchical summarization for supervised fine-tuning to infer latent attributes—including geographic location, occupation, handedness, and pet ownership. Experiments show that GPT-4o accurately inferred the author’s city, Carnegie Mellon University PhD student status, right-handedness, and cat ownership; however, both models exhibited name hallucination, exposing reliability bottlenecks in fine-grained social information modeling. Our work delineates the capability boundaries and hallucination mechanisms of foundation models in personalized learning from egocentric visual data.

Technology Category

Application Category

📝 Abstract

Motivated by recent improvements in generative AI and wearable camera devices (e.g. smart glasses and AI-enabled pins), I investigate the ability of foundation models to learn about the wearer's personal life through first-person camera data. To test this, I wore a camera headset for 54 hours over the course of a week, generated summaries of various lengths (e.g. minute-long, hour-long, and day-long summaries), and fine-tuned both GPT-4o and GPT-4o-mini on the resulting summary hierarchy. By querying the fine-tuned models, we are able to learn what the models learned about me. The results are mixed: Both models learned basic information about me (e.g. approximate age, gender). Moreover, GPT-4o correctly deduced that I live in Pittsburgh, am a PhD student at CMU, am right-handed, and have a pet cat. However, both models also suffered from hallucination and would make up names for the individuals present in the video footage of my life.

Problem

Research questions and friction points this paper is trying to address.

Can AI learn personal life from first-person video data?

Evaluating GPT models' ability to summarize and infer personal details.

Assessing accuracy and hallucination in AI-generated life summaries.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses first-person video data for personal life learning

Fine-tunes GPT-4o on hierarchical video summaries

Evaluates model accuracy and hallucination risks

🔎 Similar Papers

No similar papers found.

Authors to Follow