WHAR Datasets: An Open Source Library for Wearable Human Activity Recognition

📅 2025-08-12

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

WHAR research has long suffered from fragmented dataset formats, leading to poor reproducibility, low cross-study comparability, and inefficient preprocessing. To address this, we introduce the first standardized, open-source data processing library for WHAR, featuring a unified data interface and configuration-driven architecture compatible with PyTorch and TensorFlow, and integrating nine mainstream WHAR datasets. Our method employs declarative configuration to fully automate data loading, segmentation, augmentation, and batching, and introduces a multi-process parallel preprocessing engine. Experiments demonstrate up to 3.8× faster preprocessing versus manual implementations. We successfully reproduce TinyHar and MLP-HAR with performance matching original reports, validating the library’s advantages in reproducibility, extensibility, and computational efficiency. This work establishes a standardized infrastructure to advance WHAR research.

Technology Category

Application Category

📝 Abstract

The lack of standardization across Wearable Human Activity Recognition (WHAR) datasets limits reproducibility, comparability, and research efficiency. We introduce WHAR datasets, an open-source library designed to simplify WHAR data handling through a standardized data format and a configuration-driven design, enabling reproducible and computationally efficient workflows with minimal manual intervention. The library currently supports 9 widely-used datasets, integrates with PyTorch and TensorFlow, and is easily extensible to new datasets. To demonstrate its utility, we trained two state-of-the-art models, TinyHar and MLP-HAR, on the included datasets, approximately reproducing published results and validating the library's effectiveness for experimentation and benchmarking. Additionally, we evaluated preprocessing performance and observed speedups of up to 3.8x using multiprocessing. We hope this library contributes to more efficient, reproducible, and comparable WHAR research.

Problem

Research questions and friction points this paper is trying to address.

Lack of standardized formats for wearable activity datasets

Need for reproducible and efficient data handling workflows

Difficulty in comparing results across different HAR datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Standardized data format for wearable activity recognition

Configuration-driven design enabling reproducible workflows

Integration with PyTorch and TensorFlow frameworks

🔎 Similar Papers

No similar papers found.