🤖 AI Summary
Existing subband encoders suffer from fixed input-length constraints and lack of frequency-position awareness, limiting their ability to model variable-length industrial signals (e.g., acoustic or vibration data). To address this, we propose the first time-frequency foundation model supporting arbitrary-length inputs: (1) a band-splitting architecture enables adaptive subband decomposition; (2) relative frequency position encoding explicitly captures spectral structure; and (3) a hierarchical band encoding and feature aggregation mechanism eliminates the need for segmentation or padding. The model is pre-trained at scale via contrastive learning on the SIREN benchmark. It achieves state-of-the-art performance on industrial anomaly detection and fault classification tasks, with significant improvements in cross-sampling-rate generalization and cross-domain robustness.
📝 Abstract
Pre-trained foundation models have demonstrated remarkable success in vision and language, yet their potential for general machine signal modeling-covering acoustic, vibration, and other industrial sensor data-remains under-explored. Existing approach using sub-band-based encoders has achieved competitive results but are limited by fixed input lengths, and the absence of explicit frequency positional encoding. In this work, we propose a novel foundation model that integrates an advanced band-split architecture with relative frequency positional embeddings, enabling precise spectral localization across arbitrary sampling configurations. The model supports inputs of arbitrary length without padding or segmentation, producing a concise embedding that retains both temporal and spectral fidelity. We evaluate our method on SIREN (https://github.com/yucongzh/SIREN), a newly introduced large-scale benchmark for machine signal encoding that unifies multiple datasets, including all DCASE task 2 challenges (2020-2025) and widely-used industrial signal corpora. Experimental results demonstrate consistent state-of-the-art performance in anomaly detection and fault identification, confirming the effectiveness and generalization capability of the proposed model. We open-sourced ECHO on https://github.com/yucongzh/ECHO.