AI-Generated Fall Data: Assessing LLMs and Diffusion Model for Wearable Fall Detection

📅 2025-05-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the scarcity of real-world acceleration data for elderly fall detection, this paper systematically evaluates the efficacy of synthetic fall data generated by large language models (LLMs: GPT-4o, GPT-4, Gemini) and diffusion models. It presents the first comparative study—within wearable sensing contexts—of three generative paradigms—text-to-motion (SATO, ParCo), text-to-text, and diffusion-based generation—in enhancing LSTM-based fall classification. Experiments identify sensor sampling rate (with LLMs yielding maximal gains at 20 Hz), wear location, and fall representation as critical factors governing synthetic data utility. Results show that LLM-generated data significantly improves detection performance under low-frequency sampling; diffusion-synthesized data better approximates real acceleration distributions but delivers inconsistent downstream task improvements; and overall synthetic data value depends more on modality alignment with the target sensing modality than on standalone generation fidelity.

Technology Category

Application Category

📝 Abstract
Training fall detection systems is challenging due to the scarcity of real-world fall data, particularly from elderly individuals. To address this, we explore the potential of Large Language Models (LLMs) for generating synthetic fall data. This study evaluates text-to-motion (T2M, SATO, ParCo) and text-to-text models (GPT4o, GPT4, Gemini) in simulating realistic fall scenarios. We generate synthetic datasets and integrate them with four real-world baseline datasets to assess their impact on fall detection performance using a Long Short-Term Memory (LSTM) model. Additionally, we compare LLM-generated synthetic data with a diffusion-based method to evaluate their alignment with real accelerometer distributions. Results indicate that dataset characteristics significantly influence the effectiveness of synthetic data, with LLM-generated data performing best in low-frequency settings (e.g., 20Hz) while showing instability in high-frequency datasets (e.g., 200Hz). While text-to-motion models produce more realistic biomechanical data than text-to-text models, their impact on fall detection varies. Diffusion-based synthetic data demonstrates the closest alignment to real data but does not consistently enhance model performance. An ablation study further confirms that the effectiveness of synthetic data depends on sensor placement and fall representation. These findings provide insights into optimizing synthetic data generation for fall detection models.
Problem

Research questions and friction points this paper is trying to address.

Scarcity of real-world fall data for training detection systems
Evaluating LLMs and diffusion models for synthetic fall data generation
Assessing synthetic data impact on fall detection model performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs generate synthetic fall data for training
Compare text-to-motion and text-to-text models
Diffusion-based data aligns closest to real data
🔎 Similar Papers
No similar papers found.
Sana Alamgeer
Sana Alamgeer
PhD
Computer VisionDeep LearningTime Series AnalysisPrompt EngineeringGenAI
Y
Yasine Souissi
University of North Carolina, Charlotte, USA
A
Anne H. H. Ngu
Texas State University, San Marcos, USA