🤖 AI Summary
WHAR research has long suffered from fragmented dataset formats, leading to poor reproducibility, low cross-study comparability, and inefficient preprocessing. To address this, we introduce the first standardized, open-source data processing library for WHAR, featuring a unified data interface and configuration-driven architecture compatible with PyTorch and TensorFlow, and integrating nine mainstream WHAR datasets. Our method employs declarative configuration to fully automate data loading, segmentation, augmentation, and batching, and introduces a multi-process parallel preprocessing engine. Experiments demonstrate up to 3.8× faster preprocessing versus manual implementations. We successfully reproduce TinyHar and MLP-HAR with performance matching original reports, validating the library’s advantages in reproducibility, extensibility, and computational efficiency. This work establishes a standardized infrastructure to advance WHAR research.
📝 Abstract
The lack of standardization across Wearable Human Activity Recognition (WHAR) datasets limits reproducibility, comparability, and research efficiency. We introduce WHAR datasets, an open-source library designed to simplify WHAR data handling through a standardized data format and a configuration-driven design, enabling reproducible and computationally efficient workflows with minimal manual intervention. The library currently supports 9 widely-used datasets, integrates with PyTorch and TensorFlow, and is easily extensible to new datasets. To demonstrate its utility, we trained two state-of-the-art models, TinyHar and MLP-HAR, on the included datasets, approximately reproducing published results and validating the library's effectiveness for experimentation and benchmarking. Additionally, we evaluated preprocessing performance and observed speedups of up to 3.8x using multiprocessing. We hope this library contributes to more efficient, reproducible, and comparable WHAR research.